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(57) Abstract 

TTie present invention is directed to a strategy to identify small peptides that activate any G protein coupled receptor (GPCR) or 
inactive any constitutively active GPCR by screening combinatorial peptide libraries. The invention comprises expressing a peptide of a 
peptide library tethered to a GPCR of interest in a cell, and monitoring die cell to detennine whether the pcpiide is an agonist or negative 
antagonist of the GPCR of interest. The peptide is tethered to the GPCR by replacing the amino terminus c*' the GPCR with the amino 
terminus of a self-activating receptor, and replacing tfic natural peptide ligand present in the amino terminus witfi the library peptide. In 
one embodiment for discovery of agonists, a ligand of the self-activathig receptor is used to cleave the resulting amino terminus to expose 
the peptide of the peptide library. In another embodiment for discovery of agonists or negative antagonists, the GPCR construct ends 
in the peptide so the peptide is always exposed. Preferably, the self-activating receptor is the thrombin receptor and the ligand of the 
self-activating receptor is thrombin. 
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FIBLD OP THE IMVEKTIOM 
The present invention relates to drug 
discovery, and more particularly to a strategy to clone 
drugs for G protein coupled receptors. 



BACKGRODMD OF TEE INVBKTIOM 
Throughout this application various 
publications are referenced, many in parenthesis. Full 
citations for these publications are provided at the 
end of the Detailed Description. The disclosures of 
these publications in their entireties are hereby 
incorporated by reference in this application. 

It has been estimated that more than 50% of 
the drugs in clinical use today are directed at G 
protein coupled receptors (GPCRs) - Small peptides can 
activate a number of receptors of this family, such as 
receptors for thyrotropin- releasing hormone (TRH) , 
which is a tripeptide (Gershengorn and Osman 1996) , 
thrombin, for which a hexapeptide is a full agonist 
(Tapparelli et al. 1993), and f ormyl -Met -Leu- Phe, which 
is a tetrapeptide (Perez et al. 1994). Small molecules 
can inactivate const i tut ively active GPCRs, such as 
benzodiazepines, which inactivate TRH receptor mutants 
that are constitutively active (Heinflink et al . 
1995) (a constitutively active receptor is one that 
35 signals in the absence of agonist) . 

It appears that these small molecules 
interact primarily, if not exclusively, with the 
transmembrane (TM)" bundle or extracellular (EC) loops 
of GPCRs (Cascieri et al. 1995). For example, it 
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appears that the "activation domain" of a GPCR with a 
large EC amino terminus, such as the receptor for 
calcitonin, is present within the region of the 
receptor from the beginning of TM helix one to the C- 
5 terminus, which includes the TM bundle and EC loops 
(Stroop et al. 1995) . 

The discovery of peptides that could activate 
GPCRs or inactivate constitutively active GPCRs may 
have enormous potential for clinical applications 

10 because a number of peptide agonists of GPCRs are 

currently used therapeutically and diagnostically. In 
the shorter terra, the discovery of such peptides will 
yield reagents that could be used by pharmaceutical 
companies to identify ligands for or functions of 

15 "orphan" receptors. 

SUBSBfiARY OF THE INVENTION 
To this end, it is an object of the subject 
invention to provide a strategy to discover small 
20 peptides that will activate any G protein- coupled 

receptor (GPCR) or inactivate any constitutively active 
GPCR. These peptides could serve as lead chemicals for 
design of clinically useful drugs or could be used to 
identify the natural ligand or physiologic function of 
25 "orphan" receptors, that is, putative receptors that 
have been identified (i.e., cloned) but for which the 
function is unknown. The strategy uses combinatorial 
peptide libraries tethered to the GPCR. With this 
approach, millions of random peptides of a given length 
30 can be tested for activity in the context of a library 
and those that activate GPCRs or inactivate 
constitutively active GPCRs can be identified. 

The invention thus provides a method of 
identifying peptide agonists or negative antagonists of 
35 a G protein coupled receptor of interest. The method 
comprises expressing ^. peptide of a reptide library 
tethered to a G protein coupled receptor of interest in 
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a cell, and monitoring the cell to determine whether 
the peptide is an agonist or negative antagonist of the 
G protein coupled receptor of interest. 

In one embodiment for identifying peptide 
agoniBts, the expression of a peptide of a peptide 
library tethered to a G protein coupled receptor of 
interest in a cell comprises preparing a G protein 
coupled receptor construct, introducing the G protein 
coupled receptor construct into a cell, allowing the 
cell to express the G protein coupled receptor encoded 
thereby, and exposing the cell to a ligand of a self- 
activating receptor, wherein the ligand cleaves the G 
protein coupled receptor construct so as to expose the 
inserted peptide of the peptide library. The G protein 
15 coupled receptor construct for identifying a peptide 
agonist, which is also provided by the subject 
invention, comprises a nucleic acid molecule encoding a 
G protein coupled receptor with a deleted first amino 
terminus; a nucleic acid molecule encoding a second 
20 amino terminus of a self -activating receptor attached 
to the nucleic acid molecule encoding the G protein 
coupled receptor at the deleted first amino terminus, 
the second amino terminus having a deleted portion 
which is a peptide agonist for activating the self- 
25 activating receptor; and a nucleic acid molecule 

encoding the peptide of the peptide library inserted 
into the second amino terminus and replacing the 

deleted portion. 

In a further embodiment for identifying 

30 peptide negative antagonists, the G protein coupled 
receptor of interest is a constitutively active G 
protein coupled receptor and the expression of a 
peptide of a peptide library tethered to the G protein 
coupled receptor of interest in a cell comprises 

35 preparing a constitutively active G protein coupled 
receptor construct, introducing the co..jtitutively 
active G protein coupled receptor construct into a 
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cell, and allowing the cell to express the 
constitutively active G protein coupled receptor 
encoded thereby. The constitutively active G protexn 
coupled receptor construct for identifying a peptide 
negative antagonist, which is also provided by the 
subject invention, comprises a nucleic acid molecule 
encoding a constitutively active G protein coupled 
receptor with a deleted firct amino terminus; a nuclexc 
acid molecule encoding a second amino terminus of a 
self-activating receptor attached to the nucleic acxd 
molecule encoding the constitutively active G protexn 
coupled receptor at the deleted first amino terminus, 
the second amino terminus having a deleted portion 
which includes a peptide agonist for activating the 
self -activating receptor as well as any amino acxds 
positioned amino terminally to the peptide agonist; and 
a nucleic acid molecule encoding the peptide of the 
peptide library inserted into the second amino termxnus 
and replacing the deleted portion. 

in a still further embodiment for identifying 
peptide agonists, the expression of a peptide of a 
peptide library tethereu to a G protein coupled 
receptor of interest in a cell comprises preparing a G 
protein coupled receptor construct, introducing the G 
protein coupled receptor construct into a cell, and 
allowing the cell the express the G protein coupled 
receptor encoded thereby. The G protein coupled 
receptor construct for identifying a peptide agonist, 
which is also provided by the subject invention, 
comprises a nucleic acid molecule encoding a G protein 
coupled receptor with a deleted first amino terminus; a 
nucleic acid molecule encoding a second amino terminus 
of a self -activating receptor attached to the nucleic 
acid molecule encoding the G protein coupled receptor 
at the deleted first amino terminus, the second amino 
terminus having a deleted portion whi-h includes a 
peptide agonist for activating the self -activating 
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receptor as well as any amino acids positioned amino 
terminally to the peptide agonist; and a nucleic acxd 
molecule encoding the peptide of the peptide library 
inserted into the second amino terminus and replacing 
5 the deleted portion. 

BRIEF DESCRIPTION OP THE DRAWINGS 
These and other features and advantages of 
this invention will be evident from the following 
10 detailed description of preferred embodiments when read 
in conjunction with the accompanying drawings in which: 
Fig. 1 is a diagram of a G protein coupled 

receptor; 

Fig. 2 is a diagram of a thrombin receptor; 
^5 Fig. 3 is a diagram of a peptide of a peptide 

library; 

Fig. 4 is a diagram of a G protein coupled 
receptor construct according to the subject invention; 
Fig. 5 is a diagram of a const itutively 
20 active G protein coupled receptor construct according 
to the subject invention; 

Fig. 6 is a diagram of the putative two- 
dimensional topology of the human calcitonin receptor; 
Fig. 7 is a diagram of the putative two- 
25 dimensional topology of the human herpesvirus-8 GPCR; 

Fig 8 is a diagram of the putative two- 
dimensional topology of the chimera -Thr.R/HHV8 GPCR as 
it is predicted to be in the cell surface membrane of 
transfected COS-l cells; and 
3Q Pig. 9 is a plasmid map of 

pcDNA3 PROLACFLAGhTHRR/hF SHR . 

DETAILED DESCRIPTION 

The invention provides a strategy that is 
35 designed to discover small peptides that will activate 
any G protein -coupled receptor (GPCR) or inactivate any 
constitutively active GPCR. A const itutively active 
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receptor is one that signals in the absence of an 
agonist. These peptides could serve as lead chemxcals 
for design of clinically useful drugs or could be used 
to identify the natural ligand or physiologic function 
of "orphan- receptors, that is, putative receptors that 
have been identified (cloned) but for which the 
function is unknown. The discovery of peptides that 
could serve these functions may be accomplished with an 
approach that uses combinatorial peptide libraries. 
With this approach, millions of random peptides of a 
given length are tested for activity in the context of 
a library and those that activate GPCRs or inactivate 
const itutively active GPCRs are discovered. As stated 
above, this approach may have enormous potential for 
clinical applications because a number of peptide 
agonists of GPCRs are currently used therapeutically 
and diagnostically. In the shorter term, however, thxs 
technology will yield reagents that could be used by 
pharmaceutical companies to identify ligands for or 
functions of "orphan" receptors. 

To discover small peptides that can serve as 
agonists for GPCRs, a combinatorial peptide library is 
constructed that expresses random pentapeptides 
tethered to the seven TM helical bundle of any GPCR. A 
pentapeptide library was chosen based on the fact that 
TRH is a tripeptide that is blocked at both ends (3+2 

(for block) =5) and the resulting number of clones is 

workable . 

The library contains all 20 natural ammo 
acids at each of the five positions and therefore has a 
complexity of 20^ = 3.2 x 10« possible combinations. To 
this end the complementary DNA (cDNA) sequence that 
normally encodes any GPCR's N-terminal EC domain is 
substituted by a DNA sequence that encodes the N- 
terminal ectodomain of a self -activating receptor such 
as the thrombin receptor. Thrombin receptor (ThrR) is 
a GPCR that is activated by a mechanism that is 
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different trbm most GPCRe. Thron^ln is a. serinj^ 
protease that binds to and cleaves its receptor s N- 
tennlnal end at a specific site, exposing - 
tenninus that acts as a peptide agonxst tethered to the 
re-nainder of the receptor molecule. The chimeric 
ThrR/GPCR has the variable pentapeptide sequence 
substituting for the native peptide ^= 
normally unmasked by thrombin action and constitutes 

norma y tut retains thrombin binding 

the ThrR peptide agonist, but retaii.= 

, sequences and the thrombin-speclf 1= cleavage site. 
Therefore, the N-terminus of expressed receptors is 
cleaved by thrombin at the appropriate location 
:!:;sing I new K-terminus t:.t is made of J-f 
pentapeptide segment of the library tethered " the 

, remainder of the CPCR. used herein, a receptor that 
operates in this manner is referred to as a self- 
activating receptor since a ligand of the receptor 
cleaves the receptor to expose a natural peptide 

„ mLt well known of such self-activating receptors, but 
0 most w ^ ^ cticed using other such 

the invention can De reauxxy ^ ^ 

receptors (e.g., the protease activated receptor or 

esthetic «..er.inus 

5 of the chimeric ThrR/GPCR, consisting of a prolactin 

iLder or signal peptide, followed by the FU«. epitope, 
followed by the N-termlnus of the mature human ThrR, 
where the pentapeptide library Is constructed, is 
constructed by gene synthesis. The cDNA 3-^-"- 
,0 consUts of a DNA segment o£ approximately 300 base 

pairs encoding 100 amino adds that is ligated m frame 
through an appropriate restriction endonuclease 
cleavage site created by polymerase chain 
,PCR) m the CDBA of any GPCR at a position encoding 
3. he amino acids that constitute the transition between 

.he N-te^lnus and the first TH domain. After ligation 
into a mammalian expression vector, Escherichia coli is 
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transformed by electroporation and the transformants 
are subdivided into pools whose maximal workable 
complexity is determined according to the efficiency of 
mammalian cell transfection and/or sensitivity of the 

detection system. 

Amplified reporter systems based on the 
second messenger systems triggered by the GPCR are 
used. For discovery of agonists, the assay is based on 
gene induction in COS- 1 cells using ^-galactosidase as 
a .reporter gene in a single cell assay. This assay 
takes advantage of the amplification of the enzyme 
activity of the reporter, with an easily determined 
color reaction as endpoint, and of the expression of a 
single receptor clone with its tethered agonist in COS- 
15 1 cells because of replication of the pla'^imids 

introduced. The signal is increased because the 
construct used has a nuclear localization signal 
ligated to the /S-galactosidase that allows the protein 
to concentrate in the nucleus (Hersh et al . 1995) . 
Single clones that exhibit activation of chimeric 
ThrR/GPCR after thrombin addition to cleave the N- 
terminus and expose the tethered agonist, as measured 
by increased color reaction, are isolated using sib 
selection, which consists of successive subdivision and 
amplification of positive pools of clones. A number of 
other reporter systems can also be used. These 
include, but are no^ limited to, analysis of acute 
effects of agonist using Xenopus laevis oocytes in 
which one measures changes in membrane conductance - 
using calcium-activated chloride conductance for 
phosphoinositide (PI) cascade or cAMP- activated 
chloride conductance through cystic fibrosis 
transmembrane regulator (CFTR) that is co-expressed for 
CAMP cascade; induction of genes in COS-1 cells that 
yield protein products that are displayed in the 
cytoplasm or on the surfaces of cells and visualized by 
immunofluorescence (by microscopy or fluorescence 
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activated cell sorting) or immunocytochetnistry; and 
analysis of acute effects on elevation of cytoplasmic 
calcium using fluorescence indicators. 

TO discover small peptides that can serve as 

5 agonists or small peptides that can serve as negative 
antagonists (or inverse agonists) for GPCRs, a second 
type of combinatorial peptide library is constructed 
that expresses random pentapeptides tethered to the 
seven TM helical bundle of a given GPCR that is 

0 different from the one described above to discover 
agonists but is based on the same principles. This 
library also contains all 20 natural amino acids at 
each of the five positions and therefore has a 
complexity of 20= = 3.2 x 10* possible combinations. In 

5 this library, however, the cDNA sequence that normally 
encodes GPCR's N-terminal EC domain is substituted by a 
DNA sequence that encodes the self -activating 
receptor's (e.g., thrombin's) N-terminal ectodomain but 
without the domain that usually is cleaved to reveal 

10 the tethered peptide. In this library, the chimeric 
ThrR/GPCR has the variable pentapeptide sequence 
substituting for the native peptide sequence that is 
normally unmasked by thrombin action exposed as the N- 
terminus of all receptors. Therefore, the N-terminus 

,5 of expressed receptors is a random pentapeptide that 
can act as an agonist of a GPCR or as a negative 
antagonist with regard to the constitutive activity of 
some GPCRS. With regard to the negative antagonists, 
in contrast to looking for stimulation of a GPCR 

30 signalling response, monitoring is for inactivation of 

a "basal" activity. 

A two -reporter system is used for discovery 
of negative antagonists. The second reporter gene is 
used to identify cells that have been transfected and 
35 are expressing foreign proteins and to distinguish them 
from cells that have not been transfected and are not 
expressing foreign proteins. This is a crucial 
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distinction for this approach because differentiation 
between cells that have the capacity to express the 
specific reporter gene but are not because 
transcription has been inhibited and cells that are not 
5 expressing the reporter gene because they are not 

transf acted is necessary. The same reporter genes for 
GPCR-specific effects as for the discovery of agonist 
peptides are used. The nonspecific reporter for 
transfection is a construct containing a mutant of the 
0 human placental alkaline phosphatase gene (Tate et al . 
1990) that is targeted to the cytoplasm under the 
control of a cytomegalovirus promoter. Thus, one can 
monitor for 3 types of cell'': 1) cells in which ^- 
galactosidase is expressed at high levels in the 
L5 nucleus and alkaline phosphatase is expressed in the 
cytoplasm - these are transfected cells that do not 
express receptors that contain a peptide that has 
negative antagonistic activity because expression of p- 
galactosidase is induced by the constitutive signalling 
20 activity of the GPCR; 2) cells in which /J-galactosidase 
is not expressed in the nucleus and alkaline 
phosphatase is not expressed in the cytoplasm - these 
are cells that have not been transfected; and 3) cells 
in which )8-galactosidase is not expressed or is 
25 expressed at low levels in the nucleus and alkaline 

phosphatase is expressed in the cytoplasm - these are 
transfected cells that express receptors that contain a 
peptide that has negative antagonistic activity. The 
approach to sib selection is identical to that outlined 
3 0 above . 

A yeast {Saccharomyces cerevisiae) bioassay 
system that is responsive to activation of GPCRs or to 
inactivation of constitutively active GPCRs can also be 
used to screen the tethered, combinatorial peptide 
35 library. This bioassay is based on the finding that 
mammalian GPCRs expressed in yeast will regulate the 
endogenous signal transduction cascades (Dohlman et al . 



! 
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1991) . in particular the pathway for regulation of 
proliferation (King et al . 1990). A sensitive and 
specific yeast expression system permits powerful 
genetic selection methods, which use modifications in 
the endogenous pheromone response pathways (Price et 
al. 1995; price et al. 1996), to be developed for use 
with the screening methods of the subject invention. 
The pheromone signalling cascade in yeast uses one of 
two GPCRs {for (STE2) or a mating factor (STE3)} to 
couple to a heterotrimeric G protein, which is 
comprised of (GPAl) , (STE4) and (STE18) subunits, to 
activate a protein kinase signalling cascade that leads 
to cell cycle arrest, which is mediated by PARI, and 
activation of pheromone -responsive genes, such as FUSl. 
SST2 is another important member of this signalling 
pathway because it serves to desensitize (or "turn 
off") the pathway. Several members of this pathway can 
be modified to improve the sensitivity and assay of 
GPCRB. This system provides markedly greater ease of 
assay and permits the screening of hundreds of 
thousands of recombinant GPCR clones simultaneously, 
systems can be developed that can be used to screen for 
agonist and negative antagonist probes/drugs. The 
major advantage of this type of assay system over those 
usually employed to screen numerous potential 
probes/drugs rapidly, which is necessary for the 
application of the method of the subject invention, is 
that it relies on a response in a single yeast cell and 
will identify the responsive cell in a population of 

30 millions of cells. 

One assay will be a minor modification of the 
previously published yeast expression system to assay 
for activation of GPCRs in which FARl and SST2 genes 
were inactivated and a FUS1-HIS3 gene is used for 
selection of cells expressing activated GPCRs on a 
medium deficient in histidine (Price ec al . 1995) . The 
changes will involve only adapting the system so that 
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it will allow high efficiency transformation of yeast 
cells with a library that contains 3.2 million 
different GPCRs. The second assay will be modified 
more extensively so that it will measure constitutively 
activated GPCRs that are inactivated. One approach to 
this type of assay will involve using yeast cells in 
which the PARI gene is intact so that constitutively 
active GPCRs will cause cells to be arrested in the 
cell cycle. Cells in which the GPCR has been 
inactivated will not exhibit growth arrest but will 
proliferate as normal haploid cells in the absence of 

mating factor. 

The invention thus provides a method of 
identifying peptide agonists or negative antagonists of 
15 a G protein coupled receptor of interest . The method 
comprises expressing a peptide of a peptide library 
tethered to a G protein coupled receptor of interest in 
a cell, and monitoring the cell to determine whether 
the peptide is an agonist or negative antagonist of the 
20 G protein coupled receptor of interest. 

In one embodiment for identifying peptide 
agonists, the expression of a peptide of a peptide 
library tethered to a G protein coupled receptor of 
interest in a cell comprises preparing a G protein 
25 coupled receptor construct, introducing the G protein 
coupled receptor construct into a cell, allowing the 
cell to express the G protein coupled receptor encoded 
thereby, and exposing the cell to a ligand of a self- 
activating receptor, wherein the ligand cleaves the G 
30 protein coupled receptor construct so as to expose the 
inserted peptide of the peptide library. The G protein 
coupled receptor construct for identifying a peptide 
agonist, which is also provided by the subject 
invention, comprises a nucleic acid molecule encoding a 
35 G protein coupled receptor with a deleted first amino 
•:erminus; a nucleic acid molecule encoding a second 
amino terminus of a self -activating receptor attached 
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to the nucleic acid molecule encoding a G protein 
coupled receptor at the deleted first amino terminus, 
the second amino terminus having. a deleted portion 
which is a peptide agonist for activating the self- 
activating receptor; and a nucleic acid molecule 
encoding the peptide of the peptide library inserted 
into the second amino terminus and replacing the 
deleted portion. 

One embodiment of a G protein coupled 
receptor construct for identifying a peptide agonist of 
the G protein coupled receptor is shovm in Fig. 4. 
Referring to Figs. 1-4, the construct involves three 
parts based on a nucleic acid molecule encoding a G 
protein coupled receptor (10) (Fig. l) , a nucleic acid 
molecule encoding a thrombin receptor (12) (Fig. 2), and 
a nucleic acid molecule encoding a s^^^ (14) of a 

peptide library (Pig. 3). Referring to Fig. 1, the G 
protein coupled receptor (10) includes an amino 
terminus (16). Referring to Fig. 2, the thrombin 
receptor (12) also includes an amino terminus (18) . 
Within the amino terminus (18) of the thrombin receptor 
(12) is a portion (20) which is a peptide agonist for 
the thrombin receptor. When the thrombin receptor is 
exposed to thrombin, thrombin cleaves the amino 
terminal part of the molecule (22) leaving the portion 
(20) which is a peptide agonist ■ exposed. The portion 
(20) reacts with the remainder of the thrombin molecule 
and binds thereto, activating the thrombin receptor. 
Referring to Fig. 3, the peptide (14) of a peptide 

30 library is shown. 

Fig. 4 shows one embodiment of the G protein 
coupled receptor construct for identifying a peptide 
agonist according to the subject invention positioned 
within a cellular membrane (24) . The construct 
includes a nucleic acid molecule encoding the G protein 
coupled receptor (10) but a portion of the nucleic acid 
molecule which encodes the amino terminus of the 
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receptor is deleted. In its place, the amino terminus 
(18) of the thrombin receptor is inserted. Within the 
amino terminus (18) of the thrombin receptor, the 
portion which is a peptide agonist has been deleted and 
5 replaced with the peptide (14) of the peptide library. 
Thus, the G protein coupled receptor construct has the 
backbone of a selected G protein coupled receptor, with 
an amino terminus of the thrombin receptor. However, 
the normal peptide agonist of the thrombin receptor has 
10 been replaced by a peptide library. Thus, when the G 
protein coupled receptor construct of the subject 
invention is exposed to thrombin, thrombin will cleave 
the amino terminal part (22) of the construct leaving 
the peptide (14) of the peptide library exposed. If 
15 the exposed peptide is an agonist of the G protein 
coupled receptor, the receptor will be turned on. 

In a further embodiment for identifying a 
peptide negative antagonist, the G protein coupled 
receptor of interest is a constitutively active G 
protein coupled receptor and the expression of a 
peptide of a peptide library tethered to the G protein 
coupled receptor of interest in a cell comprises 
preparing a constitutively active G protein coupled 
receptor construct, introducing the constitutively 
25 active G protein coupled receptor construct into a 
cell, and allowing the cell to express the 
constitutively active G protein coupled receptor 
encoded thereby. The constitutively active G protein 
coupled receptor construct for identifying a peptide 
30 negative antagonist, which is also provided by the 

subject invention, comprises a nucleic acid molecule 
encoding a constitutively active G protein coupled 
receptor with a deleted first amino terminus; a nucleic 
acid molecule encoding a second amino terminus of a 
35 self- activating receptor attached to the nucleic acid 
molecule encoding the constitutively active G protein 
coupled receptor at the deleted first amino terminus. 
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the second amino terminus having a deleted portion 
which includes a peptide agonist for activating the 
self-activating receptor as well as any amino acids 
positioned amino terminally to the peptide agonist; and 
a nucleic acid molecule encoding the peptide of the 
peptide library inserted into the second amino terminus 
and replacing the deleted portion. 

The const i tut ively active G protein coupled 
receptor construct for identifying a peptide negative 
antagonist of the constitutively active G protein 
coupled receptor is shown in Fig. 5, positioned within 
a cellular membrane (24) . The construct includes a 
nucleic acid molecule encoding the G protein coupled 
receptor (10) but a portion of the nucleic acid 
15 molecule which encodes the amino terminus of the 

receptor is deleted. In its place, the amino terminus 
(18) of the thrombin receptor is inserted. Within the 
amino terminus (18) of the thrombin receptor, the 
portion which is a peptide agonist has been deleted as 
20 well as any amino acids positioned amino terminally to 
the peptide agonist which are normally cleaved by 
thrombin, and replaced, with the peptide (14) of the 
peptide library. Thus, the constitutively active G 
protein coupled receptor construct has the backbone of 
25 a selected G protein coupled receptor, with an amino 

terminus of the thrombin receptor. However, the normal 
peptide agonist of the thrombin receptor. has been 
replaced by a peptide library and the peptide is always 
exposed. If the exposed peptide is a negative 
30 antagonist of the constitutively . active G protein 

coupled receptor, the receptor will be turned off by 
the exposed peptide. 

In a still further embodiment for identifying 
peptide agonists, the expression of a peptide of a 
35 peptide library tethered to a G protein coupled 

receptor of interest in a cell comprises preparing a G 
protein coupled receptor construct, introducing the G 
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protein coupled receptor construct into a cell, and 
allowing the cell to express the G protein coupled 
receptor encoded thereby. The G protein coupled 
receptor construct for identifying a peptide agonist, 
5 which is also provided by the subject invention, 

comprises a nucleic acid molecule encoding a G protein 
coupled receptor with a deleted first amino terminus; a 
nucleic acid molecule encoding a second amino terminus 
of a self-activating receptor attached to the nucleic 

10 acid molecule encoding the G protein coupled receptor 
at the deleted first amino terminus, the second amino 
terminus having a deleted portion which includes a 
peptide agonist for activating the self -activating 
receptor as well as any amino acids positioned amino 

15 terminally to the peptide agonist; and a nucleic acid 
molecule encoding the peptide of the peptide library 
inserted into the second amino terminus and replacing 
the deleted portion. This G protein coupled receptor 
construct for identifying a peptide agonist of a G 

20 protein coupled receptor has the same structure as the 
construct shown in Fig. 5 except that the G protein 
coupled receptor (10) is not a constitutively active 
receptor. 

The Examples which follow relate to 
25 particular GPCRs, such as the human calcitonin 

receptor, the human follicle-stimulating hormone 
receptor, and a GPCP of human herpesvirus -8 . However, 
as should be readily apparent to those of ordinary 
skill in the art, this invention is equally applicable 
30 to any GPCR. GPCRs are the largest family of cell 

surface receptors and act indirectly to regulate the 
activity of a separate plasma membrane -bound target 
protein, which can be an enzyme or an ion channel. The 
interaction between the receptor and the target protein 
35 is mediated by a third protein, called a trimeric GTP- 
binding regulatory protein {G protein) . The activation 
of the target protein either alters the conformation of 
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one or more intracellular mediators (if the target 
protein is an enzyme) or alters the ion permeability of 
the plasma membrane (if the target protein is an ion 
channel) . 

GPCRs include, for example, the alpha- 
adrenergic receptors, the beta-adrenergic receptors, 
dopaminergic receptors, serotonergic receptors, 
muscarinic cholinergic receptors, peptidergic 
receptors, and the thyrotropin releasing hormone 
receptor. GPCRs are characterized by a seven 
transmembrane -spanning topology (see Figs. 1, 2, 4-8). 
As used herein, the amino terminus of a GPCR refers to 
that portion of the GPCR which is extracellular, 
extending from the amino end of the GPCR to the first 
15 transmembrane domain (the amino terminus is depicted in 

Figs. 4 and 5) . 

The various G protein coupled receptor 
constructs of the subject invention include the amino 
terminus of a self -activating receptor as defined 
herein.' In one embodiment, the self -activating 
receptor is the thrombin receptor. The amino acid 
sequence of this amino terminus of the thrombin 
receptor is shown in SEQ ID NO:l, with amino acid 
residues 9 to 13 of SEQ ID NO:l representing the 
25 natural peptide agonist of the thrombin receptor. 

These residues (9 to 13 of SEQ ID N0:1) are replaced 
with the peptide library in accordance with the subject 
invention. In one embodiment of the G protein coupled 
receptor construct, the amino acids normally cleaved by 
thrombin (residues 1 to 8 of SEQ ID N0:1) are also 
replaced by the peptide of the peptide library. 
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SEQ ID N0:1: 

LDATU3PRSFLLRNPNDKYEPFWi2DEEKNESGLTEYRIiVSINKSSPLQK 
3 5 C jPAFI SEDASGYL 

In one embodiment discussed in the Examples, 
the G protein coupled receptor construct is of a human 
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calcitonin receptor (see Fig. 6) . The human calcitonin 
receptor construct according to the subject invention 
has an amino acid sequence as shovm in SEQ ID NO: 44, 
wherein amino acid residues 47 to 51 of SEQ ID NO: 44 
5 are the peptide of a peptide library, amino acid 

residues 1 to 101 of SEQ ID NO: 44 are the second amino 
terminus, and amino acid residues 102 to 429 of SEQ ID 
NO: 44 are the nucleic acid molecule encoding the human 
calcitonin receptor with the first amino terminus 
10 deleted. 

SEQ ID NO: 44: 

1 5 wEDE:^KNESGI.TEYRLVSIlIKSSPLQKQLPAFISBDASGyLVLYYIJVIVGHSLS IPTLVI 

SMIFVFFRSLGCQRVTLHKNMFLTYILNSMIIIIHLVEVVPNGELVRRDPVSCKILHFF 
HQYl#!ACNYFWMI^GiyiJn'LIWAVFTEKQRIJlWYyU^WGFPLVPTTIHAITRAVYF 
NDNCWLSVETHIXYIIHGPVMAALVVNFFFLUIIVRVLVTKMRETHEAESHrmiKA^^ 
MILVPIJ.GIQFWFPWRPSNKMLGKITOYVMHSI.IHFQGFFVATIYCFCNNEVQTTVKRQ 

2 0 WAQPKIQWNQRWGRHPSNRSARAAAAAAEAGDIPIYICHQEI.RNEPANNQGEESAEIIPL 

NIIEQESSA 



In a further embodiment discussed in the 
25 Examples, the G protein coupled receptor construct is 
of a human follicle stimulating hormone receptor. The 
human follicle stimulating hormone receptor construct 
has an amino acid sequence as shown in SEQ ID NO: 2, 
wherein amino acid residues 47 to 51 of SEQ ID NO: 2 are 
3 0 the peptide of a peptide library, amino acid residues 

39 to 101 of SEQ ID NO: 2 are the second amino terminus, 
and amino acid residues 102 to 436 of SEQ ID NO: 2 are 
the nucleic acid molecule encoding the human follicle 
stimulating hormone receptor with the first amino 
35 terminus deleted. 
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MDSKGSSQKaSRUiUJ.WSNUJ«)aWSDYKDDDDKIJ5ATU)PIUC^^ 
NBSGLTBYRLVSINKSSPLQKQLPAFISEDASGYLGVNILRVLIWFISIIAITGNIIVLVILTTSQ 
yia.TVPRFLMCtreJlFADLCIGIYLIiIASVDIHTKSQYHNYAlDWQTGAGCDAAGFFTVFASBLSV 
YTLTAITLERWHTITHAMQU)CKVQIJUIAASVMVMGWIFAFAAALFPIFGISSYMKVSI^ 
5 sPLSQLYVMSIiVIJmAFWIOGCYIHIYLTVamPNlVCSSSDTRIAKRMAMLIETDFLC^ 

PFAISASIJCVPLITVSKAKILLVLFHPINSOUIPFLYAIPTKIIFRRDFFILLSKCGCYEMQAQIYR 

TETSSTVHNTHPRNGHCSSAPRVTMGSTYILVPLSHLAQN 

As used herein, the term "as shown in" when 

10 used in conjunction with a SEQ ID NO for a nucleotide 
sequence refer to a nucleotide sequence which is 
substantially the same nucleotide sequence, or 
derivatives thereof (such as deletion and hybrid 
variants thereof, splice variants thereof, etc.). 

15 Nucleotide additions, deletions, and/or substitutions, 
such as those which do not affect the translation of 
.the DNA molecule, are within the scope of a nucleotide 
sequence as shown in a particular nucleotide sequence 
(i.e. the amino acid sequence encoded thereby remains 

20 the same) . Such additions, deletions, and/or 

substitutions can be, for example, the result of point 
mutations made according to methods known to those 
skilled in the art. It is also possible to substitute 
a nucleotide which alters the amino acid sequence 

25 encoded thereby, where the amino acid substituted is a 
conservative substitution or where amino acid homology 
is conserved. It is also possible to have minor 
nucleotide additions, deletions, and/or substitutions 
which do not alter the function of the resulting GPCR. 

30 These are also within the scope of a nucleotide 

sequence as shown a particular nucleotide sequence. 

Similarly, the term "as shown in" when used 
in conjunction with a SEQ ID NO for an amino acid 
sequence refers to an amino acid sequence which is 
35 substantially the same amino acid sequence or 

derivatives thereof. Amino acid additions, deletions, 
and/or substitutions which Jo not negate the ability of 
the resulting protein (or peptide) to form a functional 
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protein (or peptide) are within the scope of an amino 
acid sequence as shovm in a particular amino acid 
sequence. Such additions, deletions, and/or 
substitutions can be, for example, the result of point 
mutations in the DNA encoding the amino acid sequence, 
such point mutations made according to methods known to 
those skilled in the art. Substitutions may be 
conservative substitutions cf amino acids. Two amino 
acid residues are conservative substitutions of one 
another, for example, where the two residues are of the 
same type. In this regard, alanine, valine, leucine, 
isoleucine, glycine, cysteine, phenylalanine, 
tryptophan, methionine, and proline, all of which are 
nonpolar residues, are of the same type. Serine, 
threonine, tyrosine, asparagine, and glutamine, all of 
which are uncharged polar residues, are of the same 
type. Another type of residue is the positively 
charged (basic) polar amino acid residue, which 
includes histidine, lysine, and arginine. Aspartic 
acid and glutamic acid, both of which are negatively 
charged (acidic) polar amino acid residues, form yet 
another type of residue. Further descriptions of the 
concept of conservative substitutions are given by 
French and Robson 1983, Taylor 1986, and Bordo and 

25 Argos 1991. 

As further used herein, the term "as shown 
in" when used in conjunction with a SEQ ID NO for a 
nucleotide or amino acid sequence is intended to cover 
linear or cyclic versions of the recited sequence 

30 (cyclic referring to entirely cyclic versions or 

versions in which only a portion of the molecule is 
cyclic, including, for example, a single amino acid 
cyclic upon itself) , and is intended to cover 
derivative or modified nucleotide or amino acids within 

35 • the recited sequence. For example, those skilled in 
the art will readily understand that ar adenine 
nucleotide could be replaced with a methyladenine, or a 
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cytosine nucleotide could be replaced with a 
methylcytosine, if a methyl side chain is desirable. 
Nucleotide sequences having a given SEQ ID NO are 
intended to encompass nucleotide sequences containing 
5 these and like derivative or modified nucleotides, as 

well as cyclic variations. As a further example, those 
skilled in the art will readily, m^derstand that an 
asparagine residue could be replaced with an 
ethylasparagine if an ethyl side chain is desired, a 

10 lysine residue could be replaced with a hydroxylysine 
if" an OH side chain is desired, or a valine residue 
could be replaced with a methylvaline if a methyl side 
chain is desired. Amino acid sequences haying a given 
SEQ ID NO are intended to encompass amino acid 

15 sequences containing these and like derivative or 

modified amino acids, as well as cyclic variations. 
Cyclic, as used herein, also refers to cyclic versions 
of the derivative or modified nucleotides and amino 
acids . 

20 As further used herein, a nucleic acid 

molecule can be deoxyribonucleic acid (DNA) or 
ribonucleic acid (RNA) . the latter including messenger 
RNA (mRNA) . The nucleic acid can be genomic or 
recombinant, biologically isolated or synthetic. 
25 The DNA molecule can be a cDNA molecule, 

which is a DNA copy of an mRNA encoding the protein. 

The G protein coupled receptor construct of 
the subject invention can be expressed in suitable host 
cells using conventional techniques. Any suitable host 
30 and/or vector system can be used to express the 

GPCR construct. For in vitro expression, bacterial 
hosts (for example, Escherichia coli) and mammalian 
hosts (for example, COS cells) are preferred. For 
screening using the GPCR construct in which the 
inserted peptide is always exposed, yeast cells are- 
preferred. The use of yeast cells as a host for 
expression of the GPCR construct allows for the 
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screening for. negative antagonists of constitutively 
active GPCRs or for the screening of agonists of GPCRs. 
Expression of the construct is desirable to identify 
peptide agonists and negative antagonists of the GPCR, 
5 which can then be used for study and/or research 
purposes, as well as for therapy of inherited or 
acquired human disorders related to GPCR function. 

Techniques for introducing the construct into 
the host cells may involve the use of expression 

10 vectors which comprise the nucleic acid molecule 

encoding the construct. These expression vectors (such 
as plasmids and viruses) can then be used to introduce 
the nucleic acid molecule into suitable host cells. 
For example, DNA encoding the construct can be injected 

15 into the nucleus of a host cell or transformed into the 
host cell using a suitable vector, or mRNA encoding the 
construct can be injected directly into the host cell, 
in order to obtain expression of the GPCR construct in 
the host cell- 

20 Various methods are known in the art for 

introducing nucleic acid molecules into host cells. 
One method is microinjection, in which DNA is injected 
directly into the nucleus of cells through fine glass 
needles (or RNA is injected directly into the cytoplasm 

25 of cells) . Alternatively, DNA can be incubated with an 

inert carbohydrate polymer (e.g, dextran) to which a 
positively charged chemical group (e.g. 
diethylaminoethyl ("DEAE")) has been coupled. The DNA 
sticks to the DEAE-dextran via its negatively charged 

30 / phosphate groups. These large DNA-containing 

particles, in turn, stick to the surfaces of cells, 
which are thought to take them in by a process known as 
endocytosis. Some of the DNA evades destruction in the 
cytoplasm of the cell and escapes to the nucleus, where 

35 it can be transcribed into RNA like any other gene in 

the cell. In another method, cells efficiently take in 
DNA in the form of a precipitate with calcium 
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v_ phosphate. cells are placed in a 
solution containing DNA and subjected to a brief 
electrical pulse that causes holes to open transiently 
in their membranes. DNA enters through the holes 
5 directly into the cytoplasm, bypassing the endocytotic 
vesicles through which they pass in the DEAE-dextran 
and calcium phosphate procedures (passage through these 
vesicles may sometimes destroy or damage DNA) . DNA can 
also be incorporated into artificial lipid vesicles, 

10 liposomes, which fuse with the cell membrane, 

delivering their contents directly into the cytoplasm. 
In an even more direct approach, used primarily with 
plant cells and tissues, DNA is absorbed to the surface 
of tungsten microprojectiles and fired into cells with 

15 a device resembling a shotgun. 

V Further methods for introducing nucleic acid 

molecules into cells involve the use of viral vectors. 
Since viral growth depends on the ability to get the 
viral genome into cells, viruses have devised clever 

20 and efficient methods for doing it. Various viral 

vectors have been used to transform mammalian cells, 
such as vaccinia virus, adenovirus, and retrovirus. 

As indicated, some of these methods o£ 
transforming a cell require the use of an intermediate 

25 plasmid vector. U.S. Patent No. 4,237,224 to Cohen and 
Boyer describes the production of expression systems in 
the form of recombinant plasmids using restriction 
enzyme cleavage and ligation with DNA ligase. These 
recombinant plasmids are then introduced by means of 

30 transformation and replicated in unicellular cultures 
including procaryotic organisms and eucaryotic cells 
grown in tissue culture. The DNA sequences are cloned 
into the plasmid vector using standard cloning 
procedures known in the art, as described by Sambrook 

35 et al. (1989) . 

Host cells into which the nucleic acid 
encoding the construct has been introduced can be used 



wo 98/34948 



PCT/US9§y02377 



- 24 - 

to produce (i/e. to functionally express) the GPCR 
construct. The cell can then be monitored to determine 
whether the peptide tethered to the GPCR is an agonist 
or negative antagonist (in the case of a const i tut ively 
5 active GPCR) of the GPCR. The method of monitoring can 
be chosen based on the signalling pathway of the GPCR, 
or the construct can further include marker or reporter 
systems as discussed in further detail below. For 
example, if the G protein coupled receptor signals 

10 through an ion channel pathway, the monitoring can 

comprise detecting levels of the ion within the cell. 
If the G protein coupled receptor signals through a 
calcium ion channel pathway, the cell to be used can be 
a Xenopus oocyte and the monitoring can comprise 

15 voltage clamp analysis. If the G protein coupled 
receptor signals through a cyclic adenosine 
monophosphate pathway, the monitoring can comprise 
detecting levels of cyclic adenosine monophosphate 
within the cell . 

20 The invention further provides a cell 

comprising the G protein coupled receptor construct of 
the subject invention, as well as an expression vector 
comprising the construct. A host cell comprising the 
expression vector is also provided. Such expression 

25 vectors include a plasmid and a virus. Preferably, the 
cell into which the construct or expression vector 
(comprising the construct) is . introduced is a Xenopus 
oocyte, a mammalian cell (such as COS-1 cells; see 
Gershengorn and Osman 1996), or a yeast cell. 

30 

EXAMPLE I 
Peptide Agonists of hFSH-R 

A combinatorial peptide library was 
35 constructed that expresses random pentapeptides 

tethered to the s-ven transmembrane helical bundle of 
the human follicle -stimulating hormone receptor 
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(hFSH-R) . This library encompasses all 20 natural 
amino acids at each of the five positions, and. 
therefore, has a complexity of 20=> = 3.2 x 10« possible 
combinations. To this end, the complementary DNA 
S sequence that normally encodes the hFSH-R' s amino 

terminal extracellular domain was substituted by a DNA 
sequence that encodes the thrombin receptor's amino 
terminal ectodoraain. The chimeric human THR-R/FSH-R 
has the variable pentapeptide sequence substituting for 

10 the native peptide sequence that is normally unmasked 
by thrombin action and constitutes the thrombin 
receptor agonist peptide, but it retains thrombin 
binding sequences and the thrombin specific cleavage 
site. Therefore, the amino terminus of expressed 

15 receptors is cleavable by thrombin at the appropriate 
location exposing a new amino terminus that is made of 
the variable pentapeptide segment of the library 
tethered to the transmembrane domains of hFSH-R. 

To monitor for cell surface expression and 

20 efficient cleavage by thrombin of the amino terminal 

end of the chimeric receptors, an epitope-tag to which 
antibodies are available was positioned proximally to 
the thrombin cleavage site. Antibodies that recognize 
thrombin receptor amino terminus distal to the position 

25 corresponding to the library are also available. 

Consequently, chimeric receptors expressed on the cell 
surface are detectable by the appropriate use of both 
types of specific antibodies before thrombin treatment, 
but only with antibodies against the distal part after 

30 thrombin treatment. 

The amino acid sequence of the chimeric human 

THR-R/FSH-R is shown in SEQ ID NO: 2: 

MDSKGSSQKGSRLLLLLWSNLLLCQC5WSDYKDDDDKLDATLDPRXXXXXNPNDKYEPFWEDEEK 
3 5 NESGLTBYRLVSINKSSPWKQLPAFISEDAGGYI^YNILRVLIWFISIIJVITGKIIVLVILTTSQ 
YKLTVPRFLMCNLAFADLCIGIYLLLIASVDIHTKSQYHNYAIDWQTGAGCDAAGFFTVFASELSV 
VTLTAITI^RWHTITHAMQU)CKVQLRHAASVMVMGWIFAFAAALFPIFGISSYMKVSICLPMDID 
SPLSQLYVMSLLVLNVIAFWICGCYIHIYLTVRNPNIVSSSSiyrRIAKRMAMLIFTDFI^^ 
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PFMSASUC\^LIWSKAKILL\niFHPIHSCANPFLYAIFTKNFWU>FFIIiSKC 
TETSSTVHNTHPRHGHCSSAPRVTNGSTYILVPLSHIAQN 

The construct consists of 436 amino acids: amino acid 
5 residues 1-30 represent the prolactin signal peptide 

(SEQ ID N0:6:MDSKGSSQKGSRLLLLLWSNLLLCQGWS) ; residues 
31-38 represent the FLAG epitope {SEQ ID 
NO:4:DYKDDDDK) ; residues 39-101 represent amino acids 
from the hTHR receptor of which residues 47-51 
10 represent the pentapeptide (SEQ ID N0:5:XXXXX) and 
residues 57-74 represent the hirudin epitope; and 
residues 102-436 represent amino acids from the hFSH 
receptor of which residues 108-128 represent 
transmembrane domain 1, residues 140-162 represent 
15 transmembrane domain 2, residues 185-206 represent 
transmembrane domain 3, residues 227-250 represent 
transmembrane domain 4, residues 270-291 represent 
transmembrane domain 5, residues 316-338 represent 
transmembrane domain 6, and residues 350-371 represent 
20 transmembrane domain 7. The signal peptide cleavage 

site lies between amino acid residues 30 and 31 of SEQ 
ID NO: 2, and the thrombin cleavage site lies between 
amino acid residues 46 and 47 of SEQ ID N0:2. Cleavage 
with thrombin thus exposes the pentapeptide that is 
25 amino acid residues 47-51 of SEQ ID N0:2. 

The construction of the DNA sequence encoding 
the amino acid sequence shown in SEQ ID NO: 2 took 
several steps that are described below: 
1) Construction of a sequence encoding the 

30 prolactin signal peptide (SEQ ID N0:3) followed by a 
FLAG epitope-tag (SEQ ID N0:4: DYKDDDDK) placed 
immediately upstream of the putative mature sequence 
for human thrombin receptor amino terminus ectodomain 
(from amino acids 34 to 95, SEQ ID N0:1) was produced 
35 by gene synthesis using standard techniques as 
described (Nussenzveig 1994) . Synthetic 
oligonucleotides obtained for the prolactin leader 
sequence-FLAG epitope-tag construction have the 
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following sequences: coding strand PROLAC-1: SEQ ID 
NO: 6: 5'- AAT TCC ACC ATG GAC TCC AAG GGC TCG AGC GAG 
AAG GGA TCT AGA CTG CT -3' ; complementary strand 
PROLAC-2: SEQ ID NO: 7: 5'- PO,- GAG GAG GAG TCT AGA TCC 
CTT CTG GCT CGA GCC CTT GGA GTC CAT GGT GG -3' ; coding 
strand PROLAC-3 : SEQ ID NO: 8: 5'- PO,- G CTG CTG CTG GTG 
GTG AGC AAC CTG CTG CTG TGC CAG GGC GTC GTG -3' ; 
complementary strand PROLAC-4: SEQ ID N0:9: 5'- PO,- CGC 
TCA CGA CGC CCT GGC ACA GCA GCA GGT TGC TCA CCA CCA GCA 
G - 3' ; FLAG- SENSE: SEQ ID NO: 10: 5'- PO,- AGC GAC TAG 
AAG GAC GAC GAC GAC AAG CTT CCT GCC TTT T -3' ; 
FLAG- ANTI -SENSE: SEQ ID NO: 11: 5'- CGA AAA GGC AGG AAG 
CTT GTC GTC GTC GTC CTT GTA GT -3' . The pair of 
ol igonucleot ides PROLAC- 1 /PROLAC- 2 ; PROLAC- 3 /PROLAC- 4 ; 
15 and FLAG-SENSE/FLAG-ANTI -SENSE v/ere annealed separately 
vat 20 /xM final oligonucleotide concentration, by 
heating at 95 »C for 5 min and cooling to 4 "C at a rate 
of 1 "C every 3 min, in 20 mM Tris-Cl pH 7.6 and 10 mM 
MgCl, buffer, using a thermal controller apparatus. 
20 Double stranded DNA was purified by agarose gel 

electrophoresis using the Mermaid™ purification system 
(Bio 101) . Purified double stranded oligonucleotides 
were ligated using equal molar concentrations. Ligation 
products were digested with Hindi II after heat 
25 inactivation of T4 DNA ligase. The resulting 125 bp 
larger fragment was purified by agarose gel 
electrophoresis using the Mermaid™ kit. Fragment of 
interest was subcloned into EcoRI and Hindlll sites of 
pBSSKII(+). Correctness of the sequence was verified by 
30 dideoxynucleotide sequencing method using Circumvent 
sequencing kit (New England Biolabs, Inc.). 
2) Construction of a sequence encoding the human 

thrombin receptor amino terminus from amino acid 
residue F" to L'^ (residues 60-101 of SEQ ID NO: 2) was 
35 obtained by assembling four synthetic overlapping 
oligonucleotides containing gaps from lu to 33 
nucleotides: coding strand THRR-1: SEQ ID N0:12: 5' - 
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TAT GCC ACC TTT TGG GAG GAT GAG GAG AAA AAT GAA AGT GGG 
TTA ACT GAA TAG - 3'; complementary strand THRR-2 : SEQ 
ID NO: 13: 5' - TG AAG AGG ACT GCT TTT ATT GAT GGA GAC 
TAA TCT GTA TTC AGT TAA CCC ACT TTC - 3 ' ; coding strand 
5 THRR-3: SEQ ID NO: 14: 5' - C AAT AAA AGC AGT CCT CTT 

CAA AAA CAA CTT CCT GCA TTC ATC TCA GAA GAT GCC - 3 ' ; 
complementary strand THRR BstEII : SEQ ID NO: 15: 5' - 
GT CAG GTA ACC GGA GGC ATC TTC TGA GAT GAA TGC AAG - 
3'. Oligonucleotide THRR BstEII inadvertently mutated 

10 hTHRR codon for P^^ into L. Oligonucleotides THRR-2 and 
THRR-3 were phosphorylated enzymaticaly using T4 
polynucleotide kinase in 50 mM Tris-HCl pH 7.5, 10 mM 
MgCla, 10 mM dithiothreitol, 1 mM ATP and 25 /xg/ml BSA. 
Oligonucleotides THRR-1, THRR-2, THRR-3 and THRR BstEII 

15 were annealed at a final concentration of 10 /xM in 20 
mM Tris-Cl pH 7.6, 10 mM MgClg buffer by heating at 95 
"C for 5 min and cooling to 4 °C at a rate of 1 °C per 8 
min using a thermal controller apparatus. The gaps 
between annealed oligonucleotides were filled- in using 

20 T4 DNA polymerase. Reaction was performed at a final 

concentration of 2.5 /xM oligonucleotides, 400 fiM dNTPs, 
50 mM NaCl, 15 mM Tris-HCl, 12.5 mM MgClj, 1 mM 
dithiothreitol, 50 ng/ml BSA, pH 7.9 at 2 5 °C for 60 
min. Reaction was stopped with 25 mM EDTA and enzyme 

25 was heat inactivated at 65 °C for 60 min. T4 DNA 
polymerase was selected not only to avoid strand 
displacement of the overlapping oligonucleotides but 
also because of its 3' to 5' exonuclease activity to 
correct the inadvertent mutation P^^ to L. 

30 3) Construction of a nucleotide sequence 

encoding amino acid residues from G"^ to N"" (residues 
102-43 6 of SEQ ID NO: 2) followed by the stop codon of 
the hFSHR was obtained by standard tag polymerase PGR 
method using the following pair of oligonucleotides: i) 
35 coding strand FSHR BstEII: SEQ ID N0:16: 5' - T GAA GGT 

TAC CTG GGG TAG AAC ATC CTC AGA GTC C - 3 ' ; and ii) 
complementary strand NotI 2170: SEQ ID N0:17: 5' - TCA 
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CGC GGC CGC TTA GTT TTG GGC TAA ATG ACT TAG AGG - 3 ' . 
The resultant PGR ^>roduct creates BstEII and Notl sites 
at the 5' and 3' ends of the coding strand, 
respectively. The BstEII site was used to connect the 
5 hFSHR sequence in frame with the amino terminus 

ectodomain of the hTHRR. The Notl site was used to 
connect the chimeric construct to the expression 
vector. Resulting PGR product was cloned into a 
pBSSKII{+)AT vector prepared to receive PGR fragments 
10 containing non-template dependent addition of 3' 

A-bverhangs . pBSSKII(+)AT vector, that contains 3' 
T-overhangs, was obtained by ligating phosphorylated 
oligonucleotides AT-SENSE: SEQ ID NO: 18: 5' - PO4 AAT 
TCG OCT T - 3' and AT- ANTI -SENSE: SEQ ID NO: 19: 5' - PO4 
15 AGC CG - 3' into pBSSKII(+) vector cut with EcoRI . A 

clone was selected with the orientation that places the^. 
newly created BstEII site closer to the Sad site of 
the vector and the Notl site of the insert closer to 
the JCpnl site of the vector. 
20 4) ' Modification of the hFSHR construct obtained 

in item # 3 through the production of silent mutations 
to destroy the two PflMI sites originally present at 
positions 1,379 and 2,080 of the hFSHR cDNA. hFSHR DNA 
sequence was modified by PGR mutagenesis, using the 
25 construct obtained in item # 3 as a template in a 
standard PGR reaction with tag polymerase and the 
following three pairs of primers: i) Pvul 1379 - 
ANTI -SENSE: SEQ ID NO: 20: 5' - CA GTC GAT CGC ATA GTT 
GTG ATA TTG GCT C - 3' and vector REVERSE PRIMER: SEQ 
30 ID N0:21: 5' - AAC AGC TAT GAG CAT G - 3' . This PGR 

fragment contains a silent nutation that at the same 
time destroys a PflMl site present at position 1379 of 
the human FSHR cDNA and introduces a Pvul site at the 
same position, ii) SacII 2080 - SENSE: SEQ ID NO:22: 5' 
35 - C CAT GCG CGG AAT GGC GAG TGC TCT TCA GC - 3 ' and 

vector M13 (-20) PRIMER: SEQ ID NO: 23: 5' - GTA AAA CGA 
CGG CCA GT - 3' . This PGR fragment contains a silent 
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mutation that, at the same time destroys a PflMl site 
present at position 2,080 of the human FSHR cDNA and 
introduces a SacII site at the same position, iii) Pvul 
1379 - SENSE: SEQ ID NO: 24: 5' - TAT GCG ATC GAG TGG 
5 CAA ACT GGG GCA GG - 3' and SacII 2080 - ANTI-SENSE: 

SEQ ID NO: 25: 5' - C ATT CCG CGG ATG GGT GTT GTG GAG 
AGT G - 3'. PGR fragment that contains two silent 
mutations that simultaneously change a PflMl site 
present at positions 1,379 into a Pvul site and a PflMl 

10 site present at position 2,080 into a SacII site 
(numbers refer to the nucleotide sequence of the 
original hFSHR cDNA) . PGR fragments originated with 
oligonucleotide pairs: i) war cut with the restriction 
enzymes BstEII and Pvxil and a 225 bp DNA fragment was 

15 purified by agarose gol electrophoresis followed by the 
GeneClean™ procedure; ii) was cut with the restriction 
enzymes SacII and Apal and a 700 bp DNA fragment was 
purified by agarose gel electrophoresis followed by the 
GeneClean™ procedure; iii) was cut with the restriction 

20 enzymes Pvul and SacII and a 140 bp DNA fragment was 

purified by agarose gel electrophoresis followed by the 
Mermaid™ procedure. Construct obtained in step # 3 was 
cut with the restriction enzymes BstEII and Apal and a 
2.9 kbp DNA fragment was purified by agarose gel 

25 electrophoresis followed by the GeneClean™ procedure. 
Finally the four purified DNA fragments were ligated 
together to produce a modified hFSHR construct in 
pBSSKI I ( + ) AT vector . 

5) Assembling of the hTHRR amino terminus 

30 ectodomain obtained from step « 2 with the modified 
hFSHR construct obtained from step #4: the DNA 
fragment encoding the hTHRR amino terminus from amino 
acid residue F" to obtained from step # 2 was 
digested with the restriction enzyme BstEII; the 
35 modified pBSSKII (+) AT -hFSHR construct obtained from 
step # 4 was linearized with the restriction enzyme 
Sad and blunted with T4 DNA polymerase to remove 3' 
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overhangs. After enzyme inactivation, blunted linear 
pBSSKII (+)AT-hFSHR was cut with the restriction enzyme 
BstEII. After appropriate DNA fragment purification the 
two modified DNA fragments (hTHRR Blunt-BstEII -130 bp 
5 long and pBSSKII (+) AT-hFSHR Blunt-BstEII -3.800 bp 
long) generated at this step (# 5) were ligated 
together to form the intermediate construct 
pBSSKII (+)hTHRR/hFSHR. Correctness of the sequence was 
verified by dideoxynucleotide sequencing method using 
10 Circumvent sequencing kit (New England Biolabs, Inc.). 
6) Plasmid construct generated at step # 1 

(pBSSKII ( + ) PROLAC FLAG (BcoRI-Hindlll) ) was cut with 
the restriction enzymes Hind:. 1 1 and Apal and the larger 
fragment was purified by the GeneClean™ procedure. 
15 Plasmid construct obtained at step # 6 (intermediate 
. pBSSKII(+)hTHRR/hFSHR) was cut with the restriction 
enzymes PflMI and Apal and the resulting -1,100 bp DNA 
fragment was purified by agarose gel electrophoresis 
followed by the GeneClean™ procedure. A pair of 
20 oligonucleotides CONNECT- SENSE : SEQ ID NO: 26: 5' - AG 

CTT GAT GCC ACG CTA TGG CCC TAG GTA AGT GAT ATG CCA CCT 
T - 3'; and CONNECT -ANTI -SENSE: SEQ ID NO: 27: 5' - G 
TGG CAT ATC ACT TAC CTA GGG CCA TAG CGT GGC ATC A - 3' 
were annealed and purified using the procedure 
25 described in step # 1. Annealed 

CONNECT- SENSE/CONNECT-ANTI- SENSE oligonucleotides were 
used to adapt the overhang created by Hindi I I digestion 
of pBSSKII (+) PROLAC FLAG with the overhang created by 
PflMI digestion of intermediate pBSSKII(+) hTHRR/hFSHR, 
30 both mentioned above. Therefore, a ligation of the two 
DNA fragments purified above was performed using the 
adaptor CONNECT-SENSE/CONNECT-ANTI -SENSE. This adaptor 
not only regenerates the two restriction enzyme sites, 
Hindlll and PflMI, but also introduces a second PflMI 
35 site between the Hindlll and the first PflMI sites. 

PflMI restriction enzyme recognition sequence is an 
interrupted palindrome that allows for the directional 
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clone of DNA fragments with appropriate overhangs. The 
two PflMI restriction enzyme sites were created to 
subclone a synthetic DNA fragment that introduces, in 
frame with both the prolactin leader sequence/FIAG 
5 epitope tag and the chimeric hTHRR/hFSHR sequence 
described in step #5, the variable pentapeptide 
agonist library preceded and followed by hTHRR amino 
acid residues corresponding to hTHRR residues from L" 
to P". CONNECT-SENSE/CONKECT-ANTI -SENSE adaptor 
10 sequence introduces stop codons at each of the three 
possible reading frames to decrease background levels 
during the construction of the library. Correctness of 
the sequence was verified by dideoxynucleotide 
sequencing method using automatic sequencing. 
15 7) Subcloning of the construct obtained at step 

#6 into the mammalian expression vector pcDNA3 and 
into. unmodified pBSSKII(+). Plasmid obtained in step # 
6 was digested with EcoRI and WotI, followed by 
purification of the -1300 bp DNA insert. pcDNAS and 
20 pBSSKII(+) were both prepared by cutting with EcoRI and 
Notl. Insert and vectors were ligated. A library 
created in the vector pcDNA3 is used in transfection 
experiments using mammalian cells. Fig. 9 shows the 
plasmid map for the THRR/FSHR construct designated 
25 pcDNA3PR0LACFLAGhTHRR/hFSHR. A library created in the 
vector pBSSKII(-i-) is used for in vitro transcription 
after linearization with the restriction enzyme WotI . 
Resulting RNAs are injected into Xenopus oocytes. 
8) Library construction: three oligonucleotides: 

30 coding strand LIBRARY-l: SEQ ID NO:28: 5' - PO, - A GAT 

CCC COG NNS NNS NNS NNS NNS AAC CCC AAT GAT AAA TAT GAA 
CCC TT - 3', where N means all four nucleotides and S 
means either G or C, a degenerate oligonucleotide pool 
with 2" different nucleic acid molecules, encoding 20* 
35 different pentapeptide sequences; complementary strand 
LIBRARY- 2: SEQ ID NO: 29: 5' - PO, - CCn GGG ATC TAG C - 
3'; and complementary • strand LIBRARY-3: SEQ ID NO:30: 
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5' - PO4 - GG TTC ATA TTT ATC - 3' were annealed at a 
molar ratio of 1 (LIBRARY-1) : 25 (LIBRARY-2) : 25 
(LIBRARY-3) in 20 tnM Tris-HCl pH 7.6 , 10 mM MgClj, by 
heating at 95 "C for 5 min and cooling to 4 °C at a rate 
5 of 1 °C per 8 min using a thermal controller apparatus. 
Annealed oligonucleotides were purified by agarose gel 
electrophoresis using the Mermaid™ kit. pcDNA3 
PROLACFLAG- CONNECT -SENSE/ CONNECT- ANTI - SENSE-hTHRR/hFSHR 
or pBSSKII (+) PROLACFLAG-CONNECT-SENSE/CONNECT-ANTI- 

10 SENSE-hTHRR/hFSHR from step # 7 were cut with the 

restriction enzyme PflKl to completion. Purified large 
fragment of each construct was ligated with the 
annealed library oligonucleotides at approximately 1:3 
molar ratios. The ligated products were ethanol 

15 precipitated and redissolved in water for 

transformation of E. coli XLl-Blue cells by 
electroporation . 

Shown below is the nucleotide sequences of 
PROLAC FLAG - CONNECT- SENSE/CONNECT- ANTI -SENSE - 

20 hTHRR/hFSHR and of the assembled LIBRARY-1 , -2 and -3 

oligonucleotides to be ligated into the two PflMI sites 
of the insert, substituting for most of the sequence 
corresponding to CONNECT-SENSE/CONNECT-ANTI- SENSE: 

25 ECO R I AATTCCRCC ATG GAC TCC AAG GGC TCG AGO CAG AAG 

GGTGG TAG CTG AGG TTC CCG AGO TCG GTC TTC 
MDSKGSSQK 

GGA TCT AGA CTG CTG CTG CTG CTG GTG GTG AGO AAC CTG CTG 
3 0 CCT AGA TCT GAC GAC GAC GAC GAC CAC CAC TCG TTG GAC GAC 

GSRLLLLLVVSNLL 

CTG TGC CAG GGC GTC GTG AGC GAC TAC AAG GAC GAC GAC GAC 
GAC ACG GTC CCG CAG CAC TCG CTG ATG TTC CTG CTG CTG CTG 
35 LCQGVVSDYKDDDD 

A Hind III AG CTT GAT GCC ACG CT Pf I M I A TGG CCC 
TTC S A CTA CGG TG C GAT ACC GGG ATC 

K L D A T L * 



40 



GTA AGT GAT ATG CCAC C TT Pfl M I T TGG GAG GAT GAG GAG 
C^Cr^A TAC GG G AAA ACC CTC CTA CTC CTC 

CAT TCA lAU F W B D E E 



4 <=, AAA AAT GAA AGT GGG TTA ACT GAA TAC AGA TTA GTC TCC ATC 

TTT TTA CTT TCA CCC AAT TGA CTT ATG TCA AAT CAG AGG TAG 
KNE.SGLTEYRIiVSl 
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AAT 
TTA 
N 


AAA 
TTT 
K 


AGC 
TCG 
S 


AGT 
TCA 
S 


CCT 
GGA 
P 


CTT 
GAA 
L 


CAA 

GTT 
Q 


AAA 
TTT 
K 


CAA 
GTT 
Q 


CTT 

GAA 
L 


CCT 
GGA 
P 


GCA 

CGT 
A 


TTC 

AAG 
F 


ATC 

TAG 
I 




TCA 
AGT 
S 


GAA 
CTT 
B 


GAT 
CTA 
D 


GCC 
CGG 
A 


TCC 
AGG 
S 


a BBt B 


TT or TAG CTG GGG TAC AAC 


5 


CCA 


ATO 


G 


Y 


GAC CCC ATG TTG 
L G Y N 


10 


ATC 
TAG 
I 


CTC 
GAG 
li 


AGA 
TCT 
R 


GTC 
CAG 
V 


CTG 
GAC 
L 


ATA 
TAT 
I 


TGG 
ACC 
W 


TTT 
AAA 
F 


ATC 
TAG 
I 


AGC 
TCG 
S 


ATC 
TAG 
I 


CTG 
GAC 
L 


GCC 
CGG 
A 


ATC 
TAG 
I 


15 


ACT 
TGA 
T 


GGG 
CCC 
G 


AAC 
TTG 
N 


ATC 
TAG 
I 


ATA 
TAT 

i 


GTG 
CAC 
V 


CTA 
GAT 
li 


GTG 
CAC 
V 


ATC 
TAG 
I 


CTA 
GAT 
L 


ACT 
TGA 
T 


ACC 
TGG 
T 


AGC 
TCG 
S 


CAA 
GTT 
Q 




TAT 

ATA. 
Y 


AAA 
TTT 
K 


CTC 
GAG 
L 


ACA 
TGT 
T 


GTC 
CAG 
V 


CCC 
GGG 
P 


AGG 
TCC 
R 


TTC 
AAG 
F 


CTT 
GAA 
L 


ATG 
TAC 
M 


TGC 
ACG 
C 


AAC 
TTG 
N 


CTG 
GAC 
L 


GCC 
CGG 
A 


XL) 


TTT 
AAA 
F 


GCT 
CGA 
A 


GAT 
CTA 
D 


CTC 
GAG 
li 


TGC 
ACG 
C 


ATT 
TAA 
I 


GGA 
CCT 
G 


ATC 
TAG 
I 


TAC 
ATG 

y 


CTG 
GAC 
£i 


CTG 
GAC 
L 


CTC 
GAG 
L 


ATT 
TAA 
I 


GCA 
CGT 
A 


25 


TCA 
AGT 
S 


GTT 
CAA 
V 


GAT 
CTA 
D 


ATC 
TAG 
I 


CAT 
GTA 
H 


ACC 
TGG 
T 


AAG 
TTC 
K 


AGC 
TCG 
S 


CAA 
GTT 
Q 


TAT 
ATA 

y 


CAC 

GTG 
H 


AAC 
TTG 
N 


TAT 
ATA 
Y 


GCG 
CGC 
A 


30 


AT 
I 


pvul C GAC TGG CAA ACT GGO GCA GGC TOT GAT GCT GCT 
TAG CTG ACC GTT TGA CCC CGT CCG ACA CTA CGA CGA 
DWQTGAQCD AA 


35 


GGC 
CCG 
G 


TTT 
AAA 
F 


TTC 
AAG 

F 


ACT 
TGA 
T 


GTC 

CAG 
V 


TTT 
AAA 
F 


GCC 
CGG 
A 


AGT 
TCA 
S 


GAG 
CTC 
E 


CTG 

GAC 
L 


TCA 
AGT 
S 


GTC 
CAG 
V 


TAC 

ATG 
Y 


ACT 
TGA 
T 




CTG 
GAC 
L 


ACA 

TGT 
T 


GCT 
CGA 
A 


ATC 
TAG 
I 


ACC 
TGG 
T 


TTG 
AAC 
L 


GAA 

CTT 
E 


AGA 

TCT 
R 


TGG 
ACC 
W 


CAT 
GTA 
H 


ACC 
TGG 
T 


ATC 
TAG 
I 


ACG 
TGC 
T 


CAT 
GTA 
H 




GCC 
CGG 
A 


ATQ 
TAG 
M 


CAG 
GTC 
Q 


CTG 
GAC 
L 


GAC 
CTG 
D 


TGC 
ACG 
C 


AAG 
TTC 
K 


GTG 
CAC 
V 


CAG 
GTC 
Q 


CTC 
GAG 
L 


CGC 
GCG 
R 


CAT 
GTA 
H 


GCT 
CGA 
A 


GCC 
CGG 
A 


45 


AGT 
TCA 
S 


GTC 
CAG 
V 


ATG 
TAG 
M 


GTG 
CAC 
V 


ATG 
TAG 
M 


GGC 
CCG 
G 


TGG 
ACC 
W 


ATT 
TAA 
I 


TTT 
AAA 
F 


GCT TTT 
CGA AAA 
A F 


GCA 
CGT 
A 


GCT 
CGA 
A 


GCC 
CGG 
A 


50 


CTC 
GAG 
L 


TTT 
AAA 
F 


CCC 
. GGG 
P 


ATC 
TAG 
I 


TTT 
AAA 
F 


GGC 
. CCG 
G 


ATC 
TAG 
I 


AGC 
TCG 
S 


AGC 
TCG 
S 


TAC 
ATG 
Y 


ATG 
TAC 

M 


AAG 
TTC 
K 


GTG 
CAC 
V 


AGC 
TCG 
S 



60 



ATC TGC CTG CCC ATG GAT ATT GAC AGC CCT TTG TCA CAG CTG 
tII GAC GGG TAC CTA TAA CTG TCG GGA AAC AGT GTC GAC 



55 ICLPMDI 



TAT GTC ATG TCC CTC CTT GTG CTC AAT GTC CTG GCC TTT GTG 
IS Sg ?Ic Lg gag GAA CAC GAG TTA CAG GAC CGG AAA CAC 
iLLVLNVijAr 



M 



GTC ATC TGT GGC TGC TAT ATC CAC ATC TAC CTC ACA GTG CGG 
TAG A?A CCG ACG ATA TAG GTG TAG ATG GAG TGT CAC GCC 



c y I H 



fiR AAC CCC AAC ATC GTG TCC TCC TCT AGT GAC ACC AGG ATC GCC 

J?G Sg TTG TAG CAC AGG AGG AGA TCA CTG TGG TCC TAG CGG 
NPN IVS SSSDTRIA 
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TTC 

K 


CGC 
GCG 
R 


ATG 
TAC 
M 


GCC ATG 
CGG TAC 
A M 


CTC 
GAG 
L 


ATC 
TAG 

I 


TTC 
AAG 

F 


ACT 
TGA 

T 


GAC 
CTG 
D 


TTC 
AAG 
F 


CTC 
GAG 
L 


TGC 
ACG 
C 


ATG 
TAC 


5 


GCA 
CGT 
A 


CCC 
GGG 
P 


ATT 
TAA 
I 


TCT 
AGA 
S 


TTC 
AAG 
F 


TTT 
AAA 
F 


GCC 
CGG 
A 


ATT 
TAA 
I 


TCT 
AGA 
S 


GCC 
CGG 
A 


TCC 
AGG 
S 


CTC 
GAG 
L 


AAG 
TTC 
K 


GTG 
CAC 
V 


10 


CCC 
GGG 
P 


CTC 
GAG 
L 


ATC 
TAG 
I 


ACT 
TGA 
T 


GTG 
CAC 
V 


TCC 
AGG 
S 


AAA 
TTT 
K 


GCA 
CGT 
A 


AAG 
TTC 
K 


ATT 
TAA 

I 


CTG 
GAC 
L 


CTG 
GAC 
L 


GTT 
CAA 
V 


CTG 
GAC 
L 


15 


TTT 
AAA 
F 


CAC 
GTG 
H 


CCC 
GGG 
P 


ATC 
TAG 
I 


AAC 
TTG 
N 


TCC 
AGG 
S 


TGT 
ACA 
C 


GCC 
CGG 
A 


AAC 
TTG 
N 


CCC 
GGG 
P 


TTC 
AAG 
F 


CTC 
GAG 
L 


TAT 
ATA 
Y 


GCC 
CGG 
A 




ATC 
TAG 
I 


TTT 
AAA 
F 


ACC 
TGG 
T 


AAA 
TTT 
K 


AAC 
TTG 
N 


TTT 
AAA 
F 


CGC 
GCG 
R 


AGA 
TCT 
R 


GAT 
CTA 
D 


TTC 
AAG 
F 


TTC 
AAG 
F 


ATT 
TAA 
I 


CTG 
GAC 
L 


CTG 
GAC 
L 


20 


AGC 
TCQ 
S 


AAG 
TTC 
K 


TGT 
ACA 
C 


GGC 
CCG 
G 


TGC 
ACG 
C 


TAT 
ATA 
Y 


GAA 
CTT 
B 


ATG 
TAC 
M 


CAA 
GTT 

Q 


GCC 
CGG 
A 


CAA 
GTT 
Q 


ATT 
TAA 
I 


TAT 
ATA 
Y 


AGG 
TCC 
R 


25 


ACA 

TG?* 
T 


GAA 
CTT 
B 


ACT 
TGA 
T 


TCA 
AGT 
S 


TCC 
AGG 
S 


ACT 
TGA 
T 


GTC 
CAG 
V 


CAC 
GTG 
H 


AAC 
TTG 
N 


ACC 
TGG 
T 


CAT 
GTA 
H 


CCS 

GG 
P 


C Sac 



30 



35 



II GG AAT GGC CAC TGC TCT TCA GOT CCC AGA GTC ACC AAT 
C GCC TTA CCG GTG ACG AGA AGT CGA GGG TCT CAG TGG TTA 
RNGH CSSAPRVTN 

GGT TCC ACT TAC ATA CTT GTC CCT CTA AGT CAT TTA GCC CAA 
Sa IS ?S ItG TAT GAA CAG GGA GAT TCA GTA AAT CGG GTT 
GSTX ILVPLSHLAQ 



AAC TAA Se Wot I 
TTG ATT CGCCGO 
N * 

^° • The DNA sequence (sense strand) is shown in 

SEQ ID NO: 31, with the antisense strand shown in SEQ ID 
NO: 32. The DNA sequence is the sequence that would 
encode the amino acid sequence (SEQ ID NO:33) of the 
45 chimeric preprotein that is prolactin signal 

peptide/FLAG epitope tag/hTHR receptor amino terminus 
corresponding to amino acid residues 34 to 96 in the 
native receptor/hFSH receptor. There are three 
nonsense ("stop") codons (one in each potential reading 
50 frame) in the middle of the sequence encoding the hTHR 
receptor amino terminus that are present to prevent 
translation of this precursor sequence if this sequence 
persisted, that is remained uncut, during construction 
of the final library (see below) . These "stop" codons. 
55 therefore, would prevent translation of non-recombinant 
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protein. To construct the library, this sequence (SEQ 
ID NO: 31) is cut with PflMI to excise one small DNA 
fragment flanked by two PflMI restriction sites that is 
replaced with the following DNA sequences: sense. SEQ 
ID NO:28; antisense, SEQ ID NO:29 and SEQ ID N0:30) 
that encodes the pentapeptide library (amino acid SEQ 
ID NO: 42) : 



10 P£l M I A GAT CCC CGG NNS NNS NNS NNS NNS AAC CCC AAT 

C GAT CTA GGG GCC „ ^ „ o m 

TLDPRXXXXXNPN 

GAT AAA TAT GAA CCC TT Pfl M I 
15 CTA TTT ATA CTT GG 

D K V E P F 

Modification of the original construct with 
20 the intent to create a combinatorial peptide library 
that expresses random pent apept ides tethered to the 
seven transmembrane helical bundle of any GPCR already 
in an "active" or "exposed" form, without the need for 
cleavage by thrombin. Use of the human follicle 
25 stimulating hormone receptor (hFSH-R) as the initial 
library construction: 

In this version of the library, the variable 
pentapeptide sequence is placed immediately after the 
prolactin signal peptide. Consequently, the cleavage 
30 produced by the signal peptidase that normally occurs 
during synthesis of type III membrane proteins would 
expose or "activate" the pentapeptide present at the 
beginning of the amino terminus, allowing it to 
interact with the seven transmembrane helical bundle of 
35 the GPCR to which it is tethered. The resulting protein 
sequence of the amino terminus ectodomain that can 
replace the amino terminus ectodomain of any GPCR is 
shown in SEQ ID NO: 34: 

4 0 MDSKGSSQKGSRLUiLWSNLLICQGWSXXXXXNPNDKYKVFWEDEEKNESGLTEYRLVSINKS 
SPLQKQLPAFISEDASGYL 
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To create this construct it is necessary to 
perform a small modification in the construct already 
obtained for the discovery of peptide agonists for the 
hFSHR using thrombin activation. By silent mutation 
5 using PGR method, it is possible to introduce a new 
Pf mi restriction endonuclease cleavage site in the 
sequence of prolactin signal peptide construct obtained 
previously at step # 1: PGR using the pair of 
oligonucleotide primers : complementary strand PflMl 

10 SIGNAL: SEQ ID NO: 35: 5' - ATC AAG CTT GTC GTC GTC GTC 

CTT GTA GTC GCT CAC CAC GCC CTG - 3' and vector M13 
(-20) PRIMER: SEQ ID NO:23: 5' - GTA AAA CGA CGG CCA GT 
- 3', using as template the construct pBSSKII(+) PROLAC 
FLAG - CONNECT- SENSE/CONNECT-ANTI -SENSE - hTHRR/hFSHR. 

15 Resulting PGR product is cut with EcoRI and Hindlll. 
This fragment will substitute the corresponding 
fragment in pBSSKII(+) PROLAC FLAG - 
CONNECT-SENSE/CONNECT-ANTI -SENSE - hTHRR/hFSHR, 
therefore introducing a third PflMI restriction enzyme 

20 recognition site at the desired position. See sequence 
below (sense, SEQ ID NO: 3 6; ant i sense, SEQ ID NO:37; 
amino acid, SEQ ID NO: 38) : 

ECO R I AATTCCACC ATG GAC TCC AAG GGC TCG AGC GAG AAG 
SCO R I ^^^^^ AGO TTC COG AGC TCG GTC TTC 

MDS KGSSQK 

GGA TCT AGA CTG CTG CTG CTG CTG GTG GTG AGC AAC CTG CTG 
??? IgI TCT Sc GAC GAC GAC GAC CAC CAC TCG TTG GAC GAC 
30 GSR LLLLLVVSNLL 

CTG TGC CAG GGC Pfl M I GTG GTG AGC GAC TAC AAG GAC GAC 
IcG CCG CAC CAC TCG CTG ATG TTC CTG CTG 

LCQG VVSDYKDD 

GAC GAC A Hind III AG CTT GAT GCC ACQ CT Pfl MIA TGG 
CTG CTG TTC GA A CTA CGG TG C GAT ACC 

DD KLDATL LW 



40 



CCC TAG GTAAGTGAT ATG CCAC C TT Pfl M^I T TGG GAG GAT 
GGG ATC CAT TCA CTA TAC GGTG G AAA ACC CTC CTA 

r> * * * FWEU 



GAG GAG AAA AAT GAA AGT GGG TTA ACT GAA TAC AGA TTA GTC TCC 
45 ?TC CTC TTT TTA CTT TCA CCC AAT TGA CTT ATG TCA AAT CAG AGG 

EEKNE.SGLTEYRLVS 

ATC AAT AAA AGC AGT CCT CTT CAA AAA CAA CTT CCT GCA TTC ATC 
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TAG 
I 


TTA 
N 


TTT 

K 


TCG 

S 


TCA 
S 


GGA 
P 


GAA 
L 


GTT 
Q 


TTT 
K 


GTT 
Q 


GAA 
L 


GGA 
P 


CGT 
A 


AAG 
F 




TCA 
AGT 
S 


GAA 
CTT 
B 


GAT 
CTA 
D 


GCC 
CGG 
A 


TCC 


Q Bat B 


TT GTP TAC CTO OGG TAC AAC 


5 


AGG 
S 


CCA 


ATG 


G 


Y 


GAC CCC ATG TTG 
L G Y N 


10 


ATC 
TAG 
I 


CTC 
GAG 
L 


AGA 
TCT 
R 


GTC 
CAG 
V 


CTG 
GAC 
L 


ATA 
TAT 

I 


TGG 
ACC 
W 


TTT 
AAA 
F 


ATC 
TAG 
I 


AGC 
TCG 
S 


ATC 
TAG 
I 


CTG 
GAC 
L 


GCC 
CGG 
A 


ATC 
TAG 
I 




ACT 
TGA 
T 


GGQ 
CCC 
G 


AAC 

TTG 
N 


ATC 
TAG 
I 


ATA 
TAT 
I 


GTG 
CAC 
V 


CTA 
GAT 
L 


GTG 
CAC 
V 


ATC 
TAG 
I 


CTA 
GAT 
L 


ACT 
TGA 
T 


ACC 
TGG 
T 


AGC 
TCG 
S 


CAA 
GTT 
Q 


15 


TAT 

ATA 
Y 


AAA 

TTT 
K 


CTC 

GAG 
L 


ACA 

TGT 
T 


GTC 

CAG 
V 


CCC 
GGG 
P 


AOG 

TCC 
R 


TTC 

AAG 
F 


CTT 

GAA 
L 


ATG 

TAC 
M 


TGC 

ACG 
C 


AAC 

TTG 
N 


CTG 
GAC 
li 


GCC 
CGG 
A 


20 


TTT 
AAA 
F 


GCT 
CGA 
A 


GAT 
CTA 
D 


CTC 
GAG 
L 


TGC 
ACG 
C 


ATT 
TAA 
I 


GGA 
CCT 
G 


ATC 
TAG 
I 


TAC 
ATG 

y 


CTG 
GAC 
L 


CTG 
GAC 
L 


CTC 
GAG 
L 


ATT 
TAA 
I 


OCA 
CGT 
A 


25 


TCA 

AGT 
S 


OTT 
CAA 
V 


GAT 
CTA 
D 


ATC 
TAG 
I 


CAT 
GTA 
H 


ACC 
TGG 
T 


AAG 

TTC 
K 


AGC 
TCG 
S 


CAA 
GTT 
Q 


TAT 
ATA 
Y 


CAC 
GTG 
H 


AAC 
TTG 
N 


TAT 
ATA 
Y 


GCG 
CGC 
A 


30 


AT PVTiI C GAC TGG CAA ACT GGG GCA GGC TOT GAT GCT OCT 
TAG CTG ACC GTT TGA CCC CGT CCG ACA CTA CGA CGA 
I D WQTGAQCDAA 


35 


GGC 
CCG 
G 


TTT 
AAA 
F 


TTC 
AAG 
F 


ACT 
TGA 
T 


GTC 
CAG 
V 


TTT 
AAA 

F 


GCC 
CGG 
A 


AGT 

TCA 
S 


GAG 
CTC 
E 


CTG 
GAC 
L 


TCA 
AGT 
S 


GTC 
CAG 
V 


TAC 
ATG 
Y 


ACT 
TGA 
T 


CTG 
GAC 
L 


ACA 
TGT 

T 


GCT 
CGA 
A 


ATC 
TAG 
I 


ACC 
TGG 
T 


TTG 
AAC 

Is 


GAA 
CTT 
E 


kGA 
TCT 
R 


TGG 
ACC 
W 


CAT 
GTA 
H 


ACC 
TGG 
T 


ATC 

TAG 
I 


ACG 
TGC 
T 


CAT 
GTA 
H 


40 


GCC 
CGG 
A 


ATG 
TAC 
M 


CAG 
GTC 
Q 


CTG 
GAC 
li 
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CTG 
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TGC 
ACG 
C 


AAG 
TTC 
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GTG 
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V 
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GTC 
Q 
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GAG 
h 


CGC 
GCG 
R 
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GTA 
H 


GCT 
CGA 
A 


GCC 
CGG 
A 


45 


AGT 
TCA 
S 


GTC 
CAG 
V 


ATG 
TAC 
M 


GTG 
CAC 
V 


ATG 
TAC 
M 


GGC 
CCG 
G 


TGG 
ACC 
W 


ATT 
TAA 
I 


TTT 
AAA 

F 


GCT 
CGA 
A 


TTT 
AAA 
F 


GCA 
CGT 
A 


GCT 
CGA 
A 


GCC 
CGG 
A 


50 
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GAG 
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TTT 
AAA 
F 


CCC 
GGG 
P 


ATC 
TAG 
I 


TTT 
AAA 
F 


GGC 
CCG 
G 


ATC 
TAG 
I 


AGC 
TCG 
S 


AGC 
TCG 
S 


TAC 
ATG 
Y 


ATG 
TAC 
M 


AAG 
TTC 
K 


GTG 
CAC 
V 


AGC 
TCG 
S 



ATC TGC CTG CCC ATG GAT ATT GAC AGC CCT TTG TCA CAG CTG 
TAG ACG GAC GGG TAC CTA TAA CTG TCG GGA AAC AGT GTC GAC 
ICLPMDIDSPLSQL 

TAT GTC ATG TCC CTC CTT GTG CTC AAT GTC CTG GCC TTT GTG 
ATA CAG TAC AGG GAG GAA CAC GAG TTA CAG GAC CGG AAA. CAC 
YVMS LLVLNVLAFV 

GTC ATC TGT GGC TGC TAT ATC CAC ATC TAC CTC ACA GTG CGG 
CAG TAG ACA CCG ACG ATA TAG GTG TAG ATG GAG TGT CAC GCC 
VICGCYIHIYLTVR 

AAC CCC AAC ATC GTG TCC TCC TCT AGT GAC ACC AGG ATC GCC 
65 TTG GGG TTG TAG CAC AGG AGG AGA TCA CTG TGG TCC TAG CGG 

NPNIVSSSSDTRIA 



55 



60 
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20 



25 



40 



R M A M 



10 



15 p H P I N 



K A K 



N 



N 



F F 



K 



M 



TGC 


ATG 


ACG 


TAC 
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M 


AAG 


GTG 


TTC 


CAC 
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V 


GTT 


CTG 


CAA 


GAG 


V 


L 


TAT 


GCC 


ATA 


CGG 


Y 


A 


CTG 


CTG 


GAG 


GAC 


L 


L 


TAT 


AGG 


ATA 


TCC 


Y 


R 


C Sac 



TGT CTT TGA AGT AGG TGA CAG GTG TTO TGG GTA GG 
TETSSTVHNTHP 

TT CM AAT GGC CAC TGC TCT TCA GCT CCC AGA GTC ACC AAT 
30 1<^C S Sg SJg ACG AGA AGT CGA GGG TCT CAG TGG TTA 

RNGHCSSAPRVTW 

ncT TCC ACT TAC ATA CTT GTC CCT CTA AGT CAT TTA GCC CAA 
SI S tcI Itc tIt GAA CAG GGA GAT TCA GTA AAT CGG GTT 
35 0STYlLV'?IiSHLA0 



AAC TAA OC Wot I 
TTG ATT CGCCGO 
N « 



Two Of the three LIBRARY oligonucleotides are 
also need to be modified as follows: coding strand 
LIBRARY-4: SEQ ID NO: 39: 5' - PO, - GTG GTG AGC NNS NNS 
NNS NNS NNS AAC CCC AAT GAT AAA TAT GAA CCC TT - 3' ; 
45 and complementary strand LIBRARY-5: SEQ ID N0:40: 5' - 
PO, - GCT CAC CAC GCC - 3' . 

The nucleotide sequence of the assembled 
LIBRARY-4, -5 and -3 oligonucleotides to be ligated into 
the two Pf mi sites of the insert, substituting for the 
50 sequence corresponding to FLAG - CONNECT -SENSE/ 

CONNECT -ANTI -SENSE is shown below (sense, SEQ ID NO:39; 
antisense, SEQ ID NO:40 and SEQ ID NO:30; amino acid, 
SEQ ID NO: 41) : 

Pfl M I GTG GTG AGC NNS NNS NNS NNS NNS AAC CCC AAT GAT 
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CCG CAC ChC TCG CTA 
GVVSXXXXXNPND 

AAA TAT GAA CCC TT P£l U I 
5 TTT ATA CTT GG 

K y E p p 

The DNA sequence is shown in SEQ ID NO: 39. 
This sequence is the same as SEQ ID NO: 31 except that 

10 it has a "silent" mutation that creastes another PflMI 
restriction site. The DNA sequence (SEQ ID NO: 3 9) is 
the sequence that would encode the amino acid sequence 
(SEQ ID NO: 41) of the chimeric preprotein that is 
prolactin signal peptide/FLAG epitope tag/hTHR receptor 

15 amino terminus corresponding to amino acid residues 34 
to 96 in the native receptor/hFSH receptor sequence 
from amino acid residue 361 to 694 of the native hFSH 
receptor. There are three nonsense ("stop") codons 
(one in each potential reading frame) in the middle of 

20 the sequence encoding the hTHR receptor amino terminus 
that are present to prevent translation of this 
precursor sequence if this sequence persisted, that is 
remained uncut, during construction of the final 
library. These "stop" codons, therefore, would prevent 

25 translation of non-recombinant protein. To construct 
the library, this sequence (SEQ ID NO:39) is cut with 
PflMI to excise two small DNA fragments flanked by two 
PflMI restriction sites that is replaced with the 
following DNA sequen'^.es : sense, SEQ ID NO:39; 

30 antisense, SEQ ID NO:40 and SEQ ID NO:30) that encodes 

the pentapeptide library (SEQ ID NO: 41) . 

EXAMPLE II 
Peptide Agonists of hCTR 
35 Fig. 6 illustrates the putative two- 

dimensional topology of a human calcitonin receptor 
(hCTR) . The top of the diagram represents the 
extracellular (EC), space, the middle portion represents 
the transmembrane (TM) domain, and the bottom portion 
40 represents the intracellular (IC) space. Each circle 
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represents a single amino acid residue designated by 
the single letter code. The residues specifically 
referred to in this application are numbered with 
regard to hCTR-2. Bold lines demarcate the 3 isof orms : 
5 hCTR-l - all residues; hCTR-2 - missing the 16 amino 
acids between R174 and S175; and hCTR-3 - missing 

residues 1-47. 

TO discover small peptides that can serve as 
agonists for hCTR, a combinatorial peptide library was 
10 constructed that expresses random pentapeptides 

tethered to the seven TM helical bundle of hCTR. A 
pentapeptide library was chosen based on the fact that 
TRH is a tripeptide that is blocked at both ends (3+2 
(for block) =5) and the resulting number of clones is 
workable. The constructed library contains all 20 
natural amino acids at each of the five positions and 
therefore has a complexity of 20= = 3.2 x 10« possible 

combinations . 

To this end, the complementary DNA (cDNA) 
sequence that normally encodes hCTR's N-terminal EC 
domain is substituted by a DNA sequence that encodes 
the thrombin receptor's N- terminal ectodomain. The 
chimeric ThrR/hCTR has the variable pentapeptide 
sequence substituting for the native peptide sequence 
25 that is normally unmasked by thrombin action and 

constitutes the ThrR peptide agonist, but it retains 
thrombin binding sequences and the thrombin-specif ic 
cleavage site. Therefore, the N-terminus of expressed 
receptors are cleaved by thrombin at the appropriate 
location exposing a new N-terminus that is made of the 
variable pentapeptide segment of the library tethered 
to the remainder of hCTR, that is, in a position that 
in the native ThrR allows it to serve as an agonist. 

To monitor for cell surface expression and 
for efficient cleavage by thrombin of the N- terminal 
end of the chimeric receptors, the FLAG epitope is 
positioned proximally to the thrombin cleavage site. 



20 
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Abs that recognize ThrR N-tertninus distal to the 
position corresponding to the library are also used. 
Consequently, chimeric receptors expressed on the cell 
surface are detectable by the appropriate use of both 
5 Abs before thrombin treatment, but only with Abs 

against the distal part after thrombin treatment. This 
confirms cell-surface expression and adequacy of 
thron«bin generation of potential agonists. 

The cDNA sequence encoding the new N-terminus 
10 of- the chimeric ThrR/hCTR, consisting of a prolactin 

leader or signal peptide, followed by the FLAG epitope, 
followed by the N- terminus of the mature human ThrR 
and the pentapeptide library is constructed by gene 
synthesis. It consists of a DNA segment of 
15 approximately 300 base pairs encoding 100 amino acids 
that is ligated in frame through an appropriate 
restriction endonuclease cleavage site that is in the 
synthetic hCTR-2 cDNA at a position encoding the amino 
acids that constitute the transition between the 
20 N-terminus and the first TM domain. After ligation 

into a mammalian expression vector, Escherichia coli is 
transformed by electroporation and the transf ormants 
are subdivided into pools whose maximal workable 
complexity is determined according to the efficiency of 
25 expression and/or sensitivity of the detection system. 

The success of expression cloning strategies, 
such as the one of the subject invention, is dependent 
on the reporter (or detection) system used. An 
amplified reporter system is used in accordance with 
30 the subject invention which is based on the second 

messenger systems triggered by hCTR. hCTR is a GPCR 
that upon activation in COS-1 cells couples through the 
G protein, Gs, to the enzyme adenylyl cyclase, causing 
an increase in intracellular concentrations of cyclic 
35 AhP (cAMP) and through G, to the enzyme phospholipase C 
causing increases in inositol i, 4. 5-trisphosphate 
(IP3) , which causes a rise in intracellular free Ca^*, 



and 1, 2-diacyiglycerol (DAG) (Nussenzveig et al. 1994). 
CAMP activates cAMP- dependent protein kinase (PKA) , an 
important intracellular regulator. One PKA substrate 
is a transcription factor known as cAMP-response 
5 element binding protein, CREB, that when activated 
binds and activates transcription by promoters that 
contain regulatory sequences known as cAMP-response 
elements (CREs) . A similar cascade initiated by IP3, 
DAG and an elevation of cytoplasmic Ca^"^ that involves 

10 other protein kinases triggers gene induction through 

other motifs using other transcription factors, such as 
the fos-jun- AP-1 system. This type of reporter . system 
is able to detect basal as well as CT-stimulated 
activation of the wild type hCTR by using a reporter 

15 piasmid containing a minimal promoter of human c-fos 
gene into which a CREB binding motif (Montminy et al . 
1990) was engineered driving transcription of the gene 
for the enzyme lucif erase (pCRE/LUC) , whose activity is 
easily detected by a chemiluminescent reaction. 

20 Unfortunately, the use of the enzyme activity of the 

luciferase reporter system requires the preparation of 
cell extracts and, therefore, monitors induction in a 
population of cells. To be able to measure a single 
positive in a very large number of negatives from a 

25 library, a single cell assay is needed. Two different 
assays are used so as to improve the likelihood of 
identifying positive clones. 

One assay is based on gene induction in COS-l 
cells. i3-galactosidase is used as a reporter gene in 

30 transfected COS-l cells. This assay takes advantage of 
the amplification of the enzyme activity of the 
reporter, with an easily determined color reaction as 
endpoint, and of the over-expression of receptors with 
tethered agonists in COS-l cells because of replication 

35 of the plasmids introduced. These experiments were 
performed by co-transfecting portions of the library 
and CRE/i8-galactosidase or AP-l/|S-galactosidase 
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constructs into COS-1 cells so as to amplify expression 
by plasmid replication using the simian virus-40 origin 
of replication in the vector. This enhances the 
signal/noise ratio substantially. Both promoter types 
5 are used as hCTR-2 transduces signals by both pathways 
in COS-1 cells. The signal is further increased 
because the construct used has a nuclear localization 
signal ligated to the iS-galactosidase that allows the 
protein to concentrate in the nucleus (Hersh at al. 

10 19-95) . Single clones that exhibit activation of 

chimeric ThrR/hCTR after thrombin addition to cleave 
the N- terminus and expose the tethered agonist, as 
measured by increased color reaction, are isolated 
using sib selection, which consists of successive 

15 subdivision and amplification of positive pools of 

clones. The optimal time after thrombin addition to 
measure the reporter gene activity is determined, as 
this involves a prolonged response on gene induction 
and the kinetics of this response vary with different 

20 activators (receptors) and in different cells. The 

optimal time will likely be 4 hrs as that was optimal 
for sCT stimulation in COS-1 cells co-transf ected with 
hCTR-2 and luciferase under the control of a 
cAMP-responsive promoter (CRE-LUC) . Even though hCTR-2 

25 exhibits constitutive activity, gene expression can be 
further activated with agonist. 

The second reporter system uses Xenopus 
laevis oocytes. This system, which was used to clone 
the TRH receptor cDNA (Straub et al. 1990), is 

3 0 dependent on coupling to Gq. In this assay system, 
individual oocytes are injected with RNA that was 
transcribed in vitro from pools of plasmids from the 
librairy. After one to three days (the optimal time for 
responsiveness is determined in preliminary experiments 

35 with hCTR-2) , the responsiveness of the oocytes to 

thrombin is measured. Thrombin aci:.tely activates 
receptors in those oocytes expressing chimeric 
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constructs in which the peptide can serve as a tethered 
agonist. Receptor activation is monitored by acute 
effects on a chloride current or on stimulation of 
radiocalcium efflux. Both endpoints are rapid, 
5 amplified processes in oocytes. Single clones that 

exhibit activation of chimeric ThrR/hCTR after thrombin 
addition are isolated using sib selection. Using both 
reporter systems allows the determination of whether 
rapid or more prolonged effects yield better signal- 

10 to^noise ratios. 

Other reporting systems may also be useful in 
the cloning strategy, such as an immunofluorescence/ 
immunocytochemical approach in COS-1 cells that also 
relies on gene induction. Commercially available 
anti-GFP Abs (Clontech) or anti-/3-galactosidase Abs 
(Proraega Biotech, Inc.) can be used to identify 
transfected COS-1 cells in which ectopic gene 
expression has been induced. Or, a plasmid can be 
constructed in which CRE or AP-1 drives expression of a 
cell-surface protein to which Abs are available, such 
as nerve growth factor recepcor (Johnson et al . 1986), 
and FACS sorting can be used to identify positive 
cells. Alternatively, a rapid effect of activation of 
the ThrR/hCTR in COS-1 cells can be monitored. 
Stimulation of an acute elevation in cytoplasmic Ca- 
using Fluo-3 (Molecular Probes) and fluorescence 
microscopy can be measured. Fluorescent calcium 
indicators (Geras Raaka and Gershengorn 1987) can be 
used. 

The strategy in accordance with the subject 
invention for the design of the library suits the 
purpose for which it is intended to be used because: x) 
It removes the putative EC N-terminal domain of hCTR to 
which intact, native CT binds with high affinity. 
Without hCTR'sN- terminus, the possibility of finding a 
peptide that acts " indirectly through hCTR' s N- terminus 
is eliminated. ii) It produces receptors that activate 
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only upon addition of thrombin. This allows for 
receptors to be active only during the experimental 
period, avoiding the cellular counter regulatory 
mechanisms associated with prolonged stimulation 
5 (desensitization. down regulation) , which can attenuate 
the detecting signal, iii) A tethered agonist 
increases the local effective concentration of the 
ligand enormously. This reduces the possibility that 
peptide antagonists, if present in the same pool of an 
10 untethered peptide library, could interfere with the 
detection of peptide agonists. 

EXM3PLE III 

Peptide Negative Antagonists of a GPCR of HHV-8 
3^5 This example relates to a newly described 

GPCR that is encoded in the genome of human 
herpesvirus- 8 (HHV-8) (or Kaposi's sarcoma-associated 
herpesvirus - KSHV) (Cesarman et al. 1996), which is a 
virus that was first identified in Kaposi's sarcoma 
20 (KS) tissues from patients with AIDS and has now been 
found in KS tissues from human immunodeficiency virus 
(HIV) -negative patie.its, in tissues from patients with 
Castleman's disease and in some B-cell lymphomas (Chang 
et al. 1994) . This receptor is referred to as HHV8 
GPCR. The objective of this example is to identify 
peptides that are high affinity negative antagonists of 
this constitutively active HHV8 GPCR. A constitutively 
active receptor is a receptor that exhibits 
agonist-independent signalling activity (Lefkowitz et 
al. 1993). Negative antagonists (or inverse agonists) 
are compounds that are capable of inhibiting the 
signalling activity of a constitutively active receptor 
(Schutz and Freissmuth 1992) . Neutral antagonists 
inhibit the action of agonists but do not affect 
35 agonist -independent activity. Neutral antagonists 
would inhibit signalling by BHV8 GPCR when it is 
activated by natural or synthetic agonists, for 
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example, interleukin-8 (IL-8) that has been shown to 
activate HHV8 GPCR in Xenopus laevis oocytes, whereas 
negative antagonists would inhibit "basal" signalling 
by HHV8 GPCR. 

5 The HHV8 GPCR is a protein of 342 amino acids 

that has the features of a GPCR including seven 
hydrophobic, putative transmembrane -spanning domains 
(Cesarman et al. 1996). Fig. 7 illustrates the 
putative two-dimensional topology of HHV8 GPCR in the 

10 cell -surface membrane. The top of the diagram 

represents the extracellular space (E) , the middle 
portion represents the transmembrane (TM) domain and 
the bottom portion represents the intracellular space 
(C) . Each circle represents an amino acid residue 

15 designated by the single letter code. The HHV8 GPCR is 
a receptor that signals via the 

phosphoinositide-inositol l, 4, 5-trisphosphate-calcium 
cascade (Berridge 1993). 

20 DiBCOvery of peptide negative antagonists of HHV8 GPCR 

It is now appreciated that receptors can 
attain an active conformation in the absence of agonist 
and manifest constitutive, that is, agonist -independent 
activity (Lefkowitz et al. 1993). This has led to 
25 renewed acceptance of the concept that receptors can 

change conformation spontaneously and oscillate between 
active and inactive states (for review, see Leff 1995) . 
Some drugs, termed negative antagonists or inverse 
agonists, appear capable of constraining receptors in 
30 an inactive state (Saraama et al. 1994). Negative 
antagonism is demonstrated when a drug binds to a 
receptor that exhibits constitutive activity and 
reduces this activity. It is important to discover 
agents that exhibit negative antagonistic properties 
35 toward HHV8 GPCR to use in exploring the role, of HHV8 
GPCR during HHV-a infection in studies in cells in 
tissue culture and in intact animals. 



PCnr/US98/02377 



- 48 



10 



15 



The subject invention provides a strategy for 
discovery of small peptide negative antagonists of HHV8 
GPCR. A tethered, combinatorial library is used to 
clone pentapeptides that are negative antagonists of 
HHV8 GPCR. A pentapeptide library is chosen based on 
the fact small peptides are effective negative 
antagonists and the number of clones is workable. The 
library contains all 20 natural amino acids at each of 
the five positions and therefore has a complexity of 
20=- = 3.2 X 10^ possible combinations. This approach is 
chosen because although there is a good deal of 
information available regarding IL-8 binding^ (see 
above) , little is known regarding the specific 
interactions between IL-8 and IL-8RS that cause 
activation (Leong et al. 1994). In fact, this is true 
for GPCRs in general (Van Rhee and Jacobson 1996) . 
Moreover, there is even less known about specific 
interactions that may inactivate a constitutively 
active receptor (Schutz and Freissmuth 1992) . Thus, 
insufficient information is available to "rationally" 
design small peptides with negative antagonist 
activities. Thus, discovery of negative antagonist 
peptides for HHV8 GPCR may best be accomplished by 
using combinatorial peptide libraries. With this 
approach, 3.2 million random peptides of five amino 
acids in length are tested for activity and those that ^ 
inactivate HHV8 GPCRs are identified by sib selection. 

Discovery of high affinity, specific pentapeptide 

30 negative antagonists of HHV8 GPCR 

To discover small peptides that can serve as 
negative antagonists (or inverse agonists) for HHV8 
GPCR. a combinatorial peptide library is constructed 
that expresses random pentapeptides tethered to the 

35 seven TM helical bundle of HHV8 GPCR. This strategy is 
based on the conclusion that one (or several 
pentapeptides will interact with the TM bundle or 
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extracellular Ipops, or both of HHV8 GPCR in a manner 
similar to that by which other small peptide 
antagonists interact with other GPCRs, such as 
receptors for opioid peptides (Costa and Herz 1989; 
Costa et al. 1992) and bradykinin (Leeb-Lundberg et al. 
1994), and by a similar mechanism inactivate HHV8 GPCR. 

A pentapeptide library is chosen based on the 
fact that peptides of this size have been shovm to be 
negative antagonists of other GPCRs and the resulting 
number of clones is workable. The library contains all 
20 natural amino acids at each of the five positions 
and, therefore, has a complexity of 20^ = 3.2 x 10*^ 
possible compounds. The librai-y is constructed by 
taking the cDNA sequence of HHV8 GPCR and substituting 
15 the sequence that normally encodes HHV8 GPCR's 

N- terminal extracellular domain by a DNA sequence that 
encodes the N- terminal ectodomain of the thrombin 
receptor (ThrR) from just after the activating peptide 
to the beginning of TM-l; that is, the sequence of the 
native ThrR from its N- terminus up to and including its 
activating peptide. Ser-Phe-Leu-Leu-Arg-Asn (SEQ ID 
N0:43:SFLLRN) , is deleted. The chimeric ThrR/HHVB GPCR 
. primary amino acid sequence begins at its N- terminus 
with the variable pentapeptide sequence ("library"), 
which is substituting for SFLLRN, followed by the ThrR 
amino terminal sequence distal to the SFLLRN sequence 
(from immediately after SFLLRN to the beginning of 
TM-l) followed by the HHV8GPCR sequence from the 
beginning of TM-l to the carboxyl end (Fig. 8). The 
distal N- terminal sequence of the ThrR is chosen rather 
than that of HHV8 GPCR because this sequence allows the 
pentapeptide library sequences on each ThrR/HHV8 GPCR 
chimera to be directed into the remainder of the 
receptor as the exposed N-terminal peptide of ThrR is 
35 guided into the receptor' s "body". The major 

difference is tb^.t■ the pentapeptide library is the 
N- terminus of the ThrR/HHV8 GPCR tethered to the 
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remainder of the receptor, that is,, in a position that 
in the native ThrR allows it to serve as an agonist but 
allows it in the chimeric receptor to serve as a 
negative antagonist. No cleavage is necessary to 
expose the N-terminus pentapeptide sequence. 
Therefore, the N-terminus of expressed receptors are 
random pentapeptides that can act as negative 
antagonists with regard to the constitutive activity of 
HHV8 GPCRs as soon as the chimeric receptor is 
expressed. The library is constructed without the need 
to cleave off a "blocking" sequence in order to expose 
the pentapeptide because it is desirable for the 
pentapeptide to inactivate the chimeric receptor as 
soon as it is expressed on the cell surface. Thus, 
monitoring is for inactivation of a "ba^^al" Bignalling 
activity of the chimeric ThrR/HHV8 GPCR. 

Fig. 8 shows the putative topology of the 
chimera ThrR/HHV8 GPCR as it is predicted to be in the 
cell surface membrane of transfected COS-1 cells. The 
top of the diagram represents the extracellular space 
(E) , the middle portion represents the transmembrane 
(TM) domain and the bottom portion represents the 
intracellular space (C) . The first five filled circles 
represent individual amino acids that are part of the 
25 pentapeptide library; each filled circle represents 20 
amino acids. The seventy unfilled circles represent 
the individual amino acid residues of the native ThrR 
sequence from just after the activating peptide 
(SFLLRN) to the beginning of TM-1. Each circle with a 
letter in it represents an amino acid residue 
designated by the single letter code of HHV8 GPCR. 

To monitor for cell surface expression of the 
chimeric receptors, antibodies to the extracellular 
domain of HHV8 GPCR are used, specif ically. antibodies 
35 to the large extracellular loop 2. 

The cDNA sequence encoding the new N-terminus 
of the chimeric ThrR/KHV8 GPCR, consisting of a 
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prolactin leader (or signal) peptide, which is cleaved 
after directing the protein to the cell surface 
membrane, followed by the pentapeptide library and the 
distal sequence of the N-terminus of ThrR is 
constructed by gene synthesis. It consists of a DNA 
segment of approximately 210 base pairs encoding 70 
amino acids that are ligated in frame through an 
appropriate restriction endonuclease cleavage site that 
iB created in the HHV8 GPCR cDNA at a position encoding 
the amino acids that constitute the transition between 
the N-tex^inus and the first TM domain. After ligation 
into a mammalian expression vector, Escherichia col. is 
transformed by electroporation and the transformants 
are subdivided into pools whose maximal workable 
complexity is determined according to the efficiency of 
mammalian cell transfection and/or sensitivity of the 

detection system ( s) . 

The BucoesB of expression cloning strategies, 
such as the one o£ the subject invention, is dependent 
on the reporter (or detection) systen,. An an^litied 
reporter system is used in accordance with the subject 
invention which is based on the second messenger system 
triggered by HHV8 GPCR. HHV8 GPCR is a GPCR that in 
COS-1 cells appears to couple through a G P-tern to 
the enzyme phospholipase C causing generation o the 
second messengers inositol l, 4 , 5-trisphosphate (IP.l , 
which causes a rise in intracellular ^^V" ; 
1,2-diacylglycerol, which activates protein kinase C 
(Kussenzveig et al . 1994). Activated protein kinase C 
triggers gene induction through speci£ic motifs using 
ttiyyc ^ fos-7un-AP-l system 

transcription factors, such as the fos j 
(Deutsch et al . 1990; Schadlow et al . 1992). This 

-in ros-1 cells since constitutive 
reporter system worlcs m COS i ceix 

activity of HHV8 GPCR, is detected using a reporter 
plasmid containing a minimal promoter of the human 
c-fos gene into which a AP-1 binding motif is 
engineered driving transcription of the gene for the 
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enzyme lucif erase (pAP-l/LUC) , whose activity is 
detected by a chemilumineBcent reaction. 
Unfortunately, the use of the enzyme activity of the 
luciferase reporter system requires the preparation of 
cell extracts and, therefore, monitors induction in a 
population of cells. To be able to identify receptors 
that are turned off by negative antagonistic activity 
of the tethered pentapeptide, a single "hit" in a very 
large number of negatives needs to be measured. 
Therefore, a single cell assay is needed. For the 
reporter, luciferase is replaced with ^-galactosidase 
that can be readily measured in individual cells. 

A two-reporter system was devised for 
discovery of negative antagonists that use gene 
induction in COS-1 cells. /3-galactosidase is used as a 
reporter enzyme in transfected COS-1 cells. This assay 
takes advantage of the amplification of the enzyme 
activity of the reporter, with an easily determined 
color reaction as endpoint, and of the over-expression 
of receptors with tethered negative antagonists in 
COS-1 cells because of replication of the plasraids 
introduced. These experiments are performed by 
co-transfecting portions of the plasmid library and a 
plasmid encoding AP-l//3-galactosidase constructs into 
COS-1 cells so as to amplify expression by plasmid 
replication using the simian virus-40 origin of 
replication in the vector. This enhances the 
signal/noise ratio substantially. The signal is 
further increased because the construct used has a 
30 nuclear localization signal ligated to the 

/3-galactosidase that allows the protein to concentrate 
in the nucleus (Hersh et al . 1995). The construct 
containing /S-galactosidase with a nuclear localization 
signal was shown to express in the nucleus of 
transfected COS-1 cells. Single clones that exhibit 
negative antagonistic activity, as measured by 
decreased color reaction, are isolated using sib 
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selection, which consists of successive subdivision and 
amplification of positive pools of clones. The optimal 
time after transfection to assay /3-galactosidase 
activity is determined empirically as this involves a 
prolonged response on gene induction and the kinetics 
of this response vary with different activators 
(receptors) and in different cells - 

A second reporter gene is used to identify 
cells that have been trans fected and are expressing 
foreign proteins to distinguish them from cells that 
have not been transfected. This is a crucial 
distinction for this approach because differentiation 
between cells that have the capacity to express the 
specific reporter gene but are not (or in which 
15 expression has been diminished) because transcription 
has been inhibited, from cells that are not expressing 
the reporter gene because they are not transfected, is 
necessary. Because the /3-galactosidase activity is 
expressed in the nucleus, it has a different 
20 localization than the nonspecific reporter of 

transfection. The nonspecific reporter of transfection 
is a construct containing a mutant of the human 
placental alkaline phosphatase gene (Tate et al . 1990) 
that is targeted to the cytoplasm under the control of 
2& a cytomegalovirus promoter; this promoter is not 

affected by HHVa GPCR and is active in all transfected 
cells. Thus, one can monitor for 3 types of cells: 1) 
cells in which /3-galactosidase is expressed at high 
levels in the nucleus and alkaline phosphatase is 
30 expressed in the cytoplasm - these are transfected 
cells that do not express receptors that contain a 
peptide that has negative antagonistic activity because 
expression of /S-galactosidase is induced by the 
constitutive signalling activity of HHV8 GPCR; 2) cells 
35 in which ^-galactosidase is not expressed in the 

nucleus and alkaline phosphatase is not expressed in 
the cytoplasm - these are cells that have not been 
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transfected; and 3) cells in which /3-galactosidase is 
not expressed (or is expressed at low levels in the 
nucleus) and alkaline phosphatase is expressed in the 
cytoplasm - these are transfected cells that express 
5 receptors that contain a peptide that has negative 
antagonistic activity. 

Other reporting systems may also be useful in 
the cloning strategy, such as the yeast bioassay system 
discussed above or an immunofluorescence/ 

10 imiiiunocytochemical approach in COS-1 cells that also 
relies on gene induction. Commercially available 
anti-i8-galactosidase antibodies (Promega Biotech, Inc.) 
can be used to identify transfected COS-l cells in 
which ectopic gene expression has been modulated. Or, 

15 a plasmid can be constructed in which AP-1 drives the 
expression of a cell -surface protein to which Abs are 
available, such as the nerve growth factor receptor 
(Johnson et al. 1986) . 

The strategy devised in the design of the 

20 library suits the purpose for which it is intended to 
be used, because a tethered negative antagonist 
increases the local effective concent ra:t ion of the 
ligand enormously. This also reduces the possibility 
that neutral antagonists or agonists, if present in the 

25 same pool of an untethered peptide library, could 
interfere with the detection of peptide negative 
antagonists . 

Although preferred embodiments have been 
30 depicted and described in detail herein, it will be 

apparent to those of ordinary skill in the relevant art 
that various modifications, additions, substitutions 
and the like can be made without departing from the 
spirit of the invention and these are therefore 
35 considered to be within the scope of the invention as 
defined in the claims which follow. 
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SEQUENCE LISTING 



(1) GENERTUj INFORMATION: 

(i) APPLICANT: Cornell Research Foundation, Inc. 

Mil TITLE OF INVENTION: STRATEGY TO CLONE DRUGS FOR G 
Kixi ixix. PROTEIN COUPLED RECEPTORS 

(iii) NUMBER OF SEQUENCES: 44 

(iv) CORRESPONDENCE ADDRESS: „ , , « 

(A? ADDRESSEE: NIXON, HARGRAVE, DEVANS & DOYLE LLP 

(B) STREET; Clinton Square, P,0. Box 1051 

(C) CITY: Rochester 

(D) STATE: New York 

(E) COUNTRY: USA 

(F) ZIP: 14603 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patent In Release #1.0, Version ftl.3U 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/795,876 

(B) FILING DATE: 06-FEB-1997 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Weyand, Karla M. 

(B) REGISTRATION NUMBER: 40,223 

(C) REFERENCE /DOCKET NUMBER: 19603/1281 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 716-263-1508 

(B) TELEFAX: 716-263-1600 



(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A.) LENGTH: 63 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
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(xi) SEQUENCE DESCRIPTION: 



Leu Asp Ala Thr Leu Asp 
1 5 

Asp Lys Tyr Glu Pro Phe 
20 

Leu Thr Glu Tyr Arg Leu 
35 

Lys Gin Leu Pro Ala Phe 
50 



SEQ ID N0:1: 

Fhe Leu Leu Arg Asn Pro Asn 
10 15 

Glu Glu Lys Asn Glu Ser Gly 
30 

Asn Lys Ser Ser Pro Leu Gin 
45 

Asp Ala Ser Gly Tyr Leu 
60 



Pro Arg Ser 



Trp Glu Asp 
25 

Val Ser lie 
40 

lie Ser Glu 
55 



INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 436 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:2: 

Met Asp ser Lys Gly Ser Ser Gin Lys Gly Ser Arg Leu Leu Leu Leu 

Leu val val Ser Asn Leu Leu Leu Cys Gin Gly Val Val Ser Asp Tyr 
20 25 30 

Lys Asp Asp Asp Asp Lys Leu Asp Ala Thr Leu Asp Pro Arg Xaa Xaa 
35 40 

xaa xaa Xaa Asn Pro Asn Asp Lys Tyr Glu Pro Phe Trp Glu Asp Glu 

— 50 



50 



55 



Glu Lys Asn Glu Ser Gly Leu Thr Glu Tyr Rrg Leu Val Ser He Asn 
Lys ser Ser Pro Leu Gin Lys Gin Leu Pro Ala Phe He Ser Glu Asp 



Ala ser Gly Tyr Leu Gly Tyr Asn He Leu Arg Val Leu lie Trp Phe 
100 105 110 

He ser He Leu Ala He Thr Gly Asn He He Val Leu val He Leu 
1X5 120 12= 

Thr Thr ser Gin Tyr Lys Leu Thr Val Pro Arg Phe Leu Met Cys Asn 
130 135 1*0 

Leu Ala Phe Ala Asp Leu Cys He Gly He Tyr Leu Leu Leu He Ala 

150 1^^ 



145 



ser val Asp He His Thr Lys Ser Gin Tyr His Asn Tyr Ala IIo Asp 
165 I'^O 
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Trp Gin Thr Gly Ma Gly Cys Asp Ala Ala Qly Phe Phe Tbr Val Phe 

j_g5 li»U 



180 



Ala Ser Glu Leu Ser Val Tyr Thr Leu Thr Ala He Thr Leu Glu Arg 
195 200 205 

Trp His Thr He Thr Hie Ala Met Gin Leu Asp Cys Lys Val Gin Leu 
2X0 215 220 

Arg Hia Ala Ala Ser Val Met Val Met Gly Trp He Phe Ala Phe Ala 

225 230 23b 

Ala Ala Leu Phe Pro He Phe Gly He Ser Ser Tyr Met Lys Val Ser 

245 250 

He cys Leu Pro Met Asp He Asp Ser Pro Leu Ser Gin Leu Tyr Val 

265 -^'^ 



260 



Met ser Leu Leu Val Leu Asn Val Leu Ala Phe Val Val He Cys Gly 
280 28b 



275 



CVS Tvr He His He Tyr Leu Thr Val Arg Asn Pro Asn He Val Ser 
' 290 295 300 

ser ser Ser Asp Thr Arg He Ala Lys Arg Met Ala Met Leu He Phe 
305 

Thr Asp Phe Leu Cys Met Ala Pro He Ser Phe Phe Ala He Ser Ala 
325 330 

ser Leu Lys Val Pro Leu He Thr Val Ser Lys Ala Lys He Leu Leu 



340 



val Leu Phe His Pro He Asn Ser Cys Ala Asn Pro Phe Leu Tyr Ala 
355 360 365 

He Phe Thr Lys Asn Phe Arg Arg Asp Phe Phe He Leu Leu Ser Lys 

375 



370 



cys Gly cys Tyr Glu Met Gin Ala Gin He Tyr Arg Thr Glu Thr Ser 

ser Thr Val His Asn Thr His Pro Arg Asn Gly His Cys Ser Ser Ala 

405 

Pro Arg Val Thr Asn Gly Ser Thr Tyr He Leu Val Pro Leu Ser His 
420 425 

Leu Ala Gin Asn 
435 

(2) INFORMATION FOR SEQ ID N0:3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Met ASP Ser Lys Gly Ser Ser Gin Lys Gly Ser Arg Leu Leu Leu Leu 
1 s 

Leu Val Val Ser Asn Leu Leu Leu Cys Gin Gly Val Val Ser 
20 25 30 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 amincp acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4: 

Asp Tyr Lys Asp Asp Asp Asp Lys 
X 5 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

Xaa Xaa Xaa Xaa Xaa 
1 5 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
^rrCC^C TOGACTCC^A OOOCTCCOC CAC^OOO.T CT^O^CTCCT 
(2) INFORMATION FOR SEQ ID N0:7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(^i) SEQUENCE DESCRIPTION: SEQ ID N0:7: 
e^cocox ~ccr .croocco^ occc.x.a.o rccroo^ 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acia 
(C STRANDEDNESS: double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(,i) SEQUENCE DESCRIPTION: SEQ ID N0:8: 
.CTOCT.CXO 0X00X0.00. .CCXOCXOCX OXOCC.OOOC oxcoxo 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 
(cl STRANDEDNESS: double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 
COCXC.CO.C OCCCXOOC.C .0CA0C.03X XOCXC.CC.C COCO 



wo 98/34948 



PCT/US98/02377 



- 60 - 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
AGCGACTACA AGGACGACGA CGACAAGCTT CCTGCCTTTT 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CGAAAAGGCA GQAAGCTTGT CGTCGTCGTC CTTGTAGT 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
TATGCCACCT TTTGGGAGGA TGAGGAGAAA AATGAAAGTG GGTTAACTGA ATAC 
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(2) INFORMATION FOR SEQ ID N0:13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 56 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



Ui) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
TGAAGAGGAC TGCTrTTATT GATGGAGACT AATCTGTATT CAGTTAACCC 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 55 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
CAATAAAAGC AGTCCTCTTC AAAAACAACT TCCTGCATTC ATCTCAGAAG ATGCC 
(2) INFORMATION FOR SEQ ID N0:15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15 
GTCAGGTAAC CGGAGGCATC TTCTGAGATG AATGCAAG 
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(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



.(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
TGAAGGTTAC CTGGGGTACA ACATCCTCAG AGTCC 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
TCACGCGGCC GCTTAGTTTT GGGCTAAATG ACTTAGAGG 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

10 

AATTCGGCTT 
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(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

AGCCG 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 
CAGTCGATCG CATAGTTGTG ATATTGGCTC 
(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
AACAGCTATG ACCATG 
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(2) INFORMATION- FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: cDNA 



.(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 
CCATCCGCGG AATGGCCACT GCTCTTCAGC 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
GTAAAACGAC GGCCAGT 

(2) INFORMATION FOR SEQ ID NO:24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 
TATGCGATCG ACTGGCAAAC TGGGGCAGG 
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(2) INFORMATION POR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) • MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
CATTCCGCGG ATGGGTGTTG TGGACAGTG 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
AGCTTGATGC CACGCTATGG CCCTAGGTAA GTGATATGCC ACCTT 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
GTGGCATATC ACTTACCTAG GGCCATAGCG TGGCATCA 
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(2) information' FOR SEQ ID NO: 28: 

(i). SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



_<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 
AGATCCCCGG AACCCCAATG ATAAATATC5A ACCCTT 
(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
{ D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: cDNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
CCGGGGATCT AGC 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nu'^leic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3 
GGTTCATATT TATC 
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(2) INFORMATION . FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1300 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

AATTCCACCA TGGACTCCAA GGGCTCGAGC CAOAAOGGAT CTAGACTGCT OCTGCTGCTG 60 

GTGGTGAGCA ACCTGCTGCT GTGCCAQGGC GTCGTGAGCG ACTACAAGGA CGACGACGAC 120 

AAGCTTGATO CCACOCTATG GCCCTAGGTA AGTGATATGC CACCTTTTGQ GAGGATGAGG 180 

AGAAAAATGA AAGT3GGTTA ACTGAATACA GATTAGTCTC CATCAATAAA AGCAGTCCTC 240 

TTCAAAAACA ACTTCCTGCA TTCATCTCAG AAGATGCCTC CGGTTACCTG GGGTACAACA 300 

TCCTCAGAGT CCTGATATGG TTTATCAGCA TCCTGGCCAT CACTGGGAAC ATCATAGTGC 3 SO 

TAQTGATCCT AACTACCAGC CAATATAAAC TCACAGTCCC CAGGTTCCTT ATGTGCAACC 420 

TGGCCTTTGC TGATCTCTGC ATTQGAATCT ACCTGCTGCT CATTGCATCA GTTGATATCC 480 

ATACCAAGAG CCAATATCAC AACTATGCGA TCGACTGGCA AACTGGGGCA GGCTGTGATG 540 

CTGCTGGCTT TTTCACTGTC TTTGCCAGTG AGCTGTCAGT CTACACTCTG ACAGCTATCA 600 

CCTTGGAAAG ATGGCATACC ATCACGCATG CCATGCAGCT GGACTGCAAG GTGCAGCTCC 660 

GCCATGCTGC CAGTCTCATG GTGATGGGCT GGATTTTTGC TTTTGCAGCT GCCCTCTTTC 720 

CCATCTTTGG CATCAGCAGC TACATGAAGG TGAGCATCTG CCTGCCCATQ GATATTGACA 780 

GCCCTTTGTC ACAGCTGTAT GTCATGTCCC TCCTTGTGCT CAATGTCCTG GCCTTTGTGG 840 

TCATCTGTGG CTGCTATATC CACATCTACC TCACAGTGCG GAACCCCAAC ATCGTGTCCT 900 

CCTCTAGTGA CACCAGGATC GCCAAGCGCA TGGCCATGCT CATCTTCACT GACTTCCTCT 960 

GCATGGCACC CATTTCTTTC TTTGCCATTT CTGCCTCCCT CAAGGTGCCC CTCATCACTG 1020 

TGTCCAAAGC AAAGATTCTG CTGGTTCTGT TTCACCCCAT CAACTCCTGT GCCAACCCCT 1080 
TCCTCTATGC CATCTTTACC AAAAACTTTC GCAGAGATTT CTTCATTCTG CTGAGCAAGT 
GTGGCTGCTA TGAAATGCAA GCCCAAATTT ATAGGACAGA AACTTCATCC ACTGTCCACA 
ACACCCATCC GCGGAATGGC CACTGCTCTT CAGCTCCCAG AGTCACCAAT GGTTCCACTT 
ACATACTTGT CCCTCTAAGT CATTTAGCCC AAAACTAAGC 
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(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1298 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 



GGCCGCTTAG 


TTTTGGGCTA 


AATGACTTAO AGGGACAAGT 


ATGTAAGTGG 


AACCATTGGT 


60 


GACTCTGGGA 


GCTGAAGAGC 


AGTGGCCATT CCGCGGATGG 


GTGTTGTGGA 


CAGTGGATGA 


120 


AGTTTCTGTC 


CTATAAATTT 


GGGCTTGCAT TTCATAGCAG 


CCACACTTGC 


TCAGCAGAAT 


180 


GAAGAAATCT 


CTQCGAAAGT 


TTTTGGTAAA GATGGCATAG 


AGGAAGGGGT 


TGGCACAGGA 


240 


GTTGATGGGG 


TGAAACAGAA 


CCAGCAGAAT CTTTGCTTTG 


GACACAGTGA 


TGAGGGGCAC 


300 


CTTGAOGGAG 


GCAQAAATGG 


CAAAGAAAGA AATGGGTGCC ATGCAGAGGA 


AGTCAGTGAA 


360 


GATGAGCATG 


GCCATGCGCT 


TGGCGATCCT GGTGTCACTA 


GAGGAGGACA 


CGATGTTGGG 


420 


GTTCCGCACT 


GTGAGGTAGA 


TGTGGATATA GCAGCCACAG 


ATGACCACAA AGGCCAGGAC 


480 


ATTGAGCACA AGGAGGQACA 


TGACATACAG CTGTGACAAA GGGCTGTCAA 


TATCCATGGG 


540 


CAGGCAGATG 


CTCACCTTCA 


TGTAGCTGCT GATGCCAAAG 


ATGGGAAAGA 


GGGCAGCTGC 


600 


AATVAGCAAAA ATCCAGCCCA 


TCACCATGAC ACTGGCAGCA 


TGGCGGAGCT 


GCACCTTGCA 


660 


GTCCAGCTGC 


ATGGCATGCG 


TGATGGTATG CCATCTTTCC 


AAGGTGATAG 


CTGTCAGAGT 


720 


GTAGACTGAC 


AGCTCACTGG 


CAAAGACAGT GAAAAAGCCA 


GCAGCATCAC 


AGCCTGCCCC 


780 


AGTTTGCCAG 


TCGATCGCAT 


AGTTGTGATA TTGGCTCTTG 


GTATGGATAT 


CAACTGATGC 


840 


AATGAGCAGC 


AGGTAGATTC 


CAATGCAGAG ATCAGCAAAG 


GCCAGGTTGC 


ACATAAGGAA 


900 


CCTGGGGACT 


GTGAGTTTAT 


ATTGGCTGGT AGTTAGGATC 


ACTAGCACTA 


TGATGT'-rCCC 


960 


AGTGATGGCC 


AGGATGCTGA 


TAAACCATAT CAGGACTCTG 


AGGATGTTGT 


ACCCCAGGTA 


1020 


ACCGGAGGCA TCTTCTGAGA 


TGAATGCAGG AAGTTGTrrT 


TGAAGAGGAC 


TGCTTTTATT 


1080 


GATGGAGACT 


AAACTGTATT 


CAGTTAACCC ACTTTCATTT 


TTCTCCTCAT 


CCTCCCAAAA 


1140 


GGGCATATCA 


CTTACCTAGG GCCATAGCGT GGCATCAAGC 


TTGTCGTCGT 


CGTCCTTGTA 


1200 


GTCGCTCACG 


ACGCCCTGGC 


ACAGCAGCAG GTTGCTCACC 


: ACCAGCAGCA GCAGCAGTCT 


1260 


AGATCCCTTC 


TGGCTCGAGC 


CCTTGGAGTC CATGGTGG 






1298 
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(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 420 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



.(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 

Met ASP ser Lys Gly Ser Ser Gin Lys Gly Ser Arg Leu Leu Leu Leu 
1 5 10 ^5 

Leu Val Val Ser Aen Leu Leu Leu Cys Gin Gly Val Val ser Asp Tyr 
20 25 30 

Lys ABp Asp Asp Asp Lys Leu Asp Ala Thr Leu Phe Trp Glu Asp Glu 
35 40 45 

Glu Lys Asn Glu Ser Gly Leu Thr Glu Tyr Arg Leu Val Ser lie Asn 
50 55 60 

Lys ser Ser Pro Leu Gin Lys Gin Leu Pro Ala Phe He Ser Glu Asp 
65 ^0 75 80 

Ala ser Gly Tyr Leu Gly Tyr Asn He Leu Arg Val Leu He Trp Phe 
85 90 

lie ser He Leu Ala He Thr Gly Asn He He Val Leu Val He Leu 
100 105 110 

Thr Thr ser Gin Tyr Lys Leu Thr Val Pro Arg Phe Leu Met Cys Asn 
115 120 125 

Leu Ala Phe Ala Asp Leu Cys He Gly He Tyr Leu Leu Leu He Ala 
130 135 

ser val Asp He His Thr Lys Ser Gin Tyr His Asn Tyr Ala He Asp 
145 150 155 l**" 

Trp Gin Thr Gly Ala Gly Cys Asp Ala Ala Gly Phe Phe Thr Val Phe 
^ 165 170 175 

Ala ser Glu Leu Ser Val Tyr Thr Leu Thr Ala He Thr Leu Glu Arg 
180 185 190 

Trp His Thr He Thr His Ala Met Gin Leu Asp Cys Lys Val Gin Leu 
^ 195 200 205 

Ara HJ.3 Ala Ala Ser Val Met Val Met Gly Trp He Phe Ala Phe Ala 
210 215 220 

Ala Ala Leu Phe Pro He Phe Gly He Ser Ser Tyr Met Lys Val Ser 
225 230 235 

He cys Leu Pro Met Asp He Asp Ser Pro Leu Ser Gin Leu Tyr Val 

245 250 -^33 
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Met Ser Leu Leu Val Leu Asn Val Leu Ala Phe Val Val He Cys Gly 
260 265 270 

CVS Tvr He Hie He Tyr Leu Thr Val Arg Asn Pro Asn He Val Ser 
275 280 285 

ser Ser Ser Asp Thr Arg He Ala Lys Arg Met Ala Met Leu He Phe 
290 295 300 

Thr ASP Phe Leu Cys Met Ala Pro He Ser Phe Phe Ala He Ser Ala 
305 310 315 320 

Ser Leu Lys Val Pro Leu He Thr Val Ser Lys Ala Lys He Leu Leu 
325 330 335 

Val Leu Phe His Pro He Asn Ser Cys Ala Asn Pro Phe Leu Tyr Ala 
340 345 350 

He Phe Thr Lys Asn Phe Arg Arg Asp Phe Phe He Leu Leu Ser Lys 
355 360 365 

Cys Gly Cys Tyr Glu Met Gin Ala :in He Tyr Arg Thr Glu Thr Ser 
370 375 380 

Ser Thr Val His Asn Thr His Pro Arg Asn Gly His Cys Ser Ser Ala 
385 390 395 400 

Pro Arg Val Thr Asn Gly Ser Thr Tyr He Leu Val Pro Leu Ser His 
405 41.0 . ,.v .7 -1 



Leu Ala Gin Asn 
420 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 85 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

Met Asp Ser Lys Gly Ser Ser Gin Lys Gly Ser Arg Leu Leu Leu Leu 
1 5 10 

Leu Val Val Ser Asn Leu Leu Leu Cys Gin Gly Val Val' Ser Xaa Xaa 
20 25 30 

Xaa Xaa Xaa Asn Pro Asn Asp Lys Tyr Glu Pro Phe Trp Glu Asp Glu 
35 40 45 

Glu Lys Asn Glu Ser Gly Leu Thr Glu Tyr Arg Leu Val Ser He Asn 
50 55 60 

Lys ser Ser Pro Jeu Gin Lys Gin Leu Pro Ala Phe He Ser Glu Asp 
65 70 75 80 
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Ala Ser Gly Tyr Leu 
85 

(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
ATCAAGCTTG TCGTCGTCGT CCTTGTAaTC GCTCACCACO CCCTG 
(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1300 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: 



AATTCCACCA TGGACTCCAA GGGCTCGAGC CAGAAGGGAT 


CTAGACtGCT 


GCTGCTGCTG 


60 


GTGGTGAGCA 


ACCTGCTGCT GTGCCAGGGC GTGGTGAGCG 


ACTACAAGGA 


CGACGACGAC 


120 


AAGCTTGATG 


CCACGCTATG GCCCTAGGTA AGTGATATGC 


CACCTTTTGG 


GAGGATGAGG 


IBO 


AGAAAAATGA AAGTGGGTTA ACTGAATACA GATTAGTCTC CATCAATAAA AGCAGTCCTC 


240 


TTCAAAAACA 


ACTTCCTGCA TTCATCTCAG AAGATGCCTC 


CGGTTACCTG 


GGGTACAACA 


300 


TCCTCAGAGT 


CCTGATATGG TTTATCAGCA TCCTGGCCAT 


CACTGGGAAC 


ATCATAGTGC 


360 


TAGTGATCCT 


AACTACCAGC CAATATAAAC TCACAGTCCC 


CAGGTTCCTT 


ATGTGCAACC 


420 


TGGCCTTTGC 


TGATCTCTGC ATTGGAATCT ACCTGCTGCT 


CATTGCATCA 


GTTGATATCC 


480 


ATACCAAGAG 


CCAATATCAC AACTATGCGA TCGACTGGCA 


AACTGGGGCA 


GGCTGTGATG 


540 


CTGCTGGCTT 


TTTCACTGTC TTTGCCAGTG AGCTGTCAGT 


CTACACTCTG 


ACAGCTATCA 


6O0 


CCTTGGAAAG 


ATJGCATACC ATCACGCATG CCATGCAGCT 


GGACTGCAAG 


GTGCAGCTCC 


660 


GCCATGCTGC 


CAGTGTCATG G^GATCGGCT GGATTTTTGC 


TTTTGCAGCT 


GCCCTCTTTC 


720 


CCATCTTTGG 


CATCAGCAGC TACATGAAGG TGAGCATCTG 


CCTGCCCATG 


GATATTGACA 


780 
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GCCCTTTGTC ACAGCTGTAT GTCATOTCCC TCCTTGTGCT CAATGTCCTG GCCTrTGTGG 840 

TCATCTGTGG CTGCTATATC CACATCTACC TCACAGTGCG GAACCCCAAC ATCGTGTCCT 900 

CCTCTAGTGA CACCAGGATC GCCAAGCGCA TGGCCATGCT CATCTTCACT GACTTCCTCT 960 

GCATCOCACC CATTTCTTTC TTTGCCATTT CTGCCTCCCT CAAGGTGCCC CTCATCACTG 1020 

TGTCX:AAAGC AAAGATTCTG CTGGTTCTGT TTCACCCCAT CAACTCCTGT GCCAACCCCr 1080 
TCCTCTATGC CATCTTTACC AAAAACTTTC GCAGAGATTT CTTCATTCTG CTGAGCAAGT 

gtggctgcta tcaaatgcaa gcccaaattt ataggacaga aacttcatcc actgtccaca 

ACACCCATCC GCGGAATGGC CACTGCTCTT CAGCTCCCAG AGTCACCAAT GGTTCCACTT 
ACATACrrGT CCCTCTAAGT CATTTAGCCC AAAACTAAGC 
(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1300 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 



GGCCGCTTAG TTTTGGGCTA 


AATQACTTAG AGGGACAAGT ATGTAAGTGG 


AACCATTGGT 


60 


GACTCTGGGA GCTGAAGAGC 


AGTGGCCATT CCGCGGATGG GTGTTGTGGA 


CAGTGGATGA 


120 


AGTTTCTGTC CTATAAATTT 


GGGCTTGCAT TTCATAGCAG CCACACTTGC 


TCAGCAGAAT 


18.0 


GAAGAAATCT CTGCGAAAGT 


TTTTGGTAAA GATGGCATAG AGGAAGGGGT 


TGGCACAGGA 


240 


GTTQATGGGG TGAAACAGAA 


CCAGCAGAAT CTTTGCTTTG GACACAGTGA 


TGAGGGGCAC 


300 


CTTOAGGGAG GCAGAAATGG 


CAAAGAAAGA AATGGGTGCC ATGCAGAGGA 


AGTCAGTGAA 


360 


GATGAGCATG GCCATGCGCT 


TGGCGATCCT GGTGTCACTA GAGGAGGACA 


CGATGTTGGG 


420 


GTTCCGCACT GTGAGGTAGA 


TGTGGATATA GCAGCCACAG ATGACCACAA 


AGGCCAGGAC 


480 


ATTGAGCACA AGGAGGGACA 


TGACATACAG CTGTGACAAA GGGCTGTCAA 


TATCCATGGG 


540 


CAGGCAGATG CTCACCTTCA 


TGTAGCTGCT GATGCCAAAG ATGGGAAAGA 


GGGCAGCTGC 


600 


AAAAGCAAAA ATCCAGCCCA 


TCACCATGAC ACTGGCAGCA TGGCGGAGCT 


GCACCTTGCA 


660 


GTCCAGCTGC ATGGCATGCG 


TGATGGTATG CCATCTTTCC AAGGTGATAG 


CTGTCAGAGT 


720 


GTAGACTGAC AGCTCACTGG 


CAAAGACAGT GAAAAAGCCA GCAGCATCAC 


AGCCTGCCCC 


780 


AGTTTGCCAG TCGATCGCAT 


AGTTGTGATA TTCGCTCTTG GTATGf=»TAT 


CAACTGATGC 


840 


AATGAGCAGC AGGTAGATTC 


CAATGCAGAG ATCAGCAAAG GCCAGGTTGC 


ACATAAGGAA 


900 
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CCTGGGGACT GTCAGTTTAT ATTCGCTCGT AGTTAGGMC ACTAGCACTA TOATOTTCCC 
;^TGATGGCC AGGATGCTCA TAAACCATAT CAGGACTCTG AGGATGTTGT ACCCCAGGTA 
ACCGGAGGCA TCTTCTGAGA TCAATGCAGG AAGTTGTTTT TGAAGAGGAC TGCTTTTATT 
GATGGAGACT AAACTGTATT CAGTTAACCC ACTTTCATTT TTCTCCTCAT CCTCCCAAAA 
GGTGGCATAT CACTTACCTA GGGCCATAGC GTGGCATCAA GCTTGTCGTC GTCGTCCTTG 
TAGTCGCTCA CCACGCCCTG GCACAGCAGC AGGTTGCTCA CCACCAGCAG CAGCAGCAGT 
CTAGATCCCT TCXGQCTCQA GCCCTTGGAG TCCATGGTGG 
(2) INFORMATION FOR SEQ ID NO: 38: 

"(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 423 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 

Met ASP Ser Lys Gly Ser Ser'Gln Lya Gly Ser Arg Leu Leu Leu Leu 

Leu Val Val Ser Asn Leu Leu Leu Cye Gin Gly Val Val ser Asp Tyr 

20 

Lys ASP ASP ASP ASP Lys Leu Asp Ala Thr Leu Leu Trp Pro Phe Trp 

35 *° 
Qlu ASP Glu Glu Lys Asn Glu Ser Gly Leu Thr Glu Tyr Arg Leu Val 

50 55 
ser lie Asn Lys Ser Ser Pro Leu Gin Lys Gin Leu Pro Ala Phe lie 
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65 

ser Glu ASP Ala Ser Gly Tyr Leu Gly Tyr Asn He Leu Arg Val Leu 
85 



He Trp Phe He Ser He Leu Ala He Thr Gly Asn He He Val Leu 



100 



val He Leu Thr Thr Ser Gin Tyr Lys Leu Thr Val Pro Arg Phe Leu 

120 



115 

Met Cys Asn Leu Ala Phe Ala Asp Leu Cys He Gly He Tyr Leu Leu 
130 

Leu lie Ala Ser Val Asp He His Thr Lys Ser Gin Tyr His Asn Tyr 

150 



145 

Ala He ASP Trp Gl^ Thr Gly Ala Gly Cys Asp Ala Ala Gly Phe Phe 
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Thr val Phe Ala Ser Glu Leu Ser Val Tyr Thr Leu Thr Ala He Thr 
X80 185 190 

Leu Glu Arg Trp His Thr He Thr His Ala Met Gin Leu Asp Cys Lys 
19I 200 205 

Val Gin Leu Arg His Ala Ala Ser Val Met Val Met Gly Trp He Phe 

210 215 220 

Ala Phe Ala Ala Ala Leu Phe Pro He Phe Gly He Ser Ser Tyr Met 
225 230 235 240 

Lye Val Ser He Cya Leu Pro Met Aep He Asp Ser Pro Leu Ser Gin 
245 250 255 

Leu Tyr Val Met Ser Leu Leu Val Leu Asn Val Leu Ala Phe Val Val 

260 265 270 

He Cys Gly Cys Tyr He His He Tyr Leu Thr Val Arg Asn Pro Asn 
275 280 285 

He Val Ser Ser Ser Ser Asp Thr Arg lie Ala Lye Arg Met Ala Met 
290 295 300 

Leu He Phe Thr Asp Phe Leu C>'s Met Ala Pro He Ser Phe Phe Ala 
305 310 315 320 

He Ser Ala Ser Leu Lys Val Pro Leu He Thr Val Ser Lys Ala Lys 
325 330 335 

He Leu Leu Val Leu Phe His Pro He Asn Ser Cys Ala Asn Pro Phe 
340 345 350 

Leu Tyr Ala He Phe Thr Lys Asn Phe Arg Arg Asp Phe Phe He Leu 
355 360 365 

Leu Ser Lys Cys Gly Cys Tyr Glu Met Gin Ala Gin He Tyr Arg Thr 
370 375 380 

Glu Thr Ser Ser Thr Val His Asn Thr His Pro Arg Asn Gly His Cys 
385 390 395 400 

Ser Ser Ala Pro Arg Val Thr Asn Gly Ser Thr Tyr He Leu Val Pro 
405 410 415 

Leu Ser His Leu Ala Gin Asn 
420 

(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
GTGGTCAGCA ACCCCAATGA TAAATATGAA CCCTT 
(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
GCTCACCACO CC 

(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 amino acids 

(B) TYPiE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:41: 

Gly Val Val Ser Xaa Xaa Xaa Xaa Xaa Asn Pro hsn Asp Lys Tyr Glu 
1 5 10 

Pro Phe 

(2) INFORMATION FOR SEQ ID NO:42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: 

Thr Leu Asp Pro Arg Xaa Xaa Xaa Xaa Xaa Asn Pro Asn Asp Lys Tyr 
1 5 10 ^5 

Glu Pro Phe 



(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 

Ser Phe Leu Leu Arg Asn 
1 5 

(2) INFORMATION FOR SEQ ID NO:44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 429 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:44: 

Met Asp Ser Lys Gly Ser Ser Gin Lys Gly Ser Arg Leu Leu Leu Leu 

c 10 15 



1 5 10 

Leu Val val Ser Asn Leu Leu Leu Cys Gin Gly val Val Ser Asp Tyr 
20 25 30 

Lys Asp Asp Asp Asp Lys Leu Asp Ala Thr Leu Asp Pro Arg Xaa Xaa 
35 40 45 

xaa xaa Xaa Asn Pro Asn Asp Lys- Tyr Glu Pro Phe Trp Glu Asp Glu 
50 55 SO 

Glu Lys Asn Glu Ser Gly Leu Thr Glu Tyr Arg Leu Val Ser lie Asn 
65 70 75 80 

Lys Ser Ser Pro Leu. Gin Lys Gin Leu Pro Ala .l.e He Ser Glu Asp 
85 90 95 
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Ala Ser Gly Tyr Leu Val Leu Tyr Tyr Leu Ala He Val Gly His Ser 
100 105 

Leu Ser lie Phe Thr Leu Val He Ser Leu Gly He Phe Val Phe Phe 

115 120 125 

Arxi ser Leu Gly Cya Gin Arg Val Thr Leu His Lys Asn Met Phe Leu 
130 135 140 

Thr Tyr He Leu Asn Ser Met He He He He His Leu Val Glu Val 
145 150 155 160 

Val Pro Aan Gly Glu Leu Val Arg Arg Aap Pro Val Ser Cya Lys He 
165 I'^O ^ 

Leu His Phe Phe His Gin Tyr Met Met Ala Cys Aan Tyr Phe Trp Met 
180 IBS 190 

Leu Cys Glu Gly He Tyr Leu His Thr Leu He Val Val Ala Val Phe 
195 200 205 

Thr Glu Lys Gin Arg Leu Arg Trp Tyr Tyr Leu Leu Gly Trp Gly Phe 
210 215 220 

Pro Leu val Pro Thr Thr He His Ala He Thr Arg Ala Val Tyr Phe 
225 230 235 240 

Asn Asp Asn cys Trp Leu Ser Val Glu Thr His Leu Leu Tyr He He 
245 250 -^3= 

His Gly pro val Met Ala Ala Leu Val Val Asn Phe Phe Phe Leu Leu 
260 265 270 

Asn He val Arg Val Leu Val Thr Lys Met Arg Glu Thr His Glu Ala 
275 280 285 

Glu ser His Met Tyr Leu Lys Ala Val Lys Ala Thr Met He Leu Val 



pro Leu Leu Gly He Gin Phe Val Val Phe Pro Trp Arg Pro Ser Asn 
305 310 320 

Lys Met Leu Gly Lys He Tyr Asp Tyr Val Met His Ser Leu He His 
325 330 -»•> = 

Phe Gin Gly Phe Phe Val Ala Thr He Tyr Cys . Phe Cys Asn Asn Glu 
340 345 

val Gin Thr Thr Val Lys Arg Gin Trp Ala Gin Phe Lys He Gin Trp 
355 360 365 

Asn Gin Arg Trp Gly Arg Arg Pro Ser Asn Arg Ser Ala Arg Ala Ala 
370 375 380 

Ala Ala Ala Ala Glu Ala Gly Asp He Pro He Tyr He Cys His Gin 
385 390 395 

Glu Leu Arg Asn Glu Pro Ala Asn Asn Gin Gly Glu Glu Ser Ala Glu 
405 410 

He He Pro Leu Asn He He Glu Gin Glu Ser Ser Ala 
420 425 
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V9imT IS CLAimO IS: . 

1. A method of identifying peptide agonists 
or negative antagonists of a G protein coupled receptor 
of interest, said method comprising: 

expressing a peptide of a peptide library 
tethered to a G protein coupled receptor of interest in 
a cell; and 

monitoring said cell to determine whether the 
peptide is an agonist or negative antagonist of said G 
protein coupled receptor of interest. 

2. The method of claim 1 for identifying a 
peptide agonist wherein expressing a peptide of a 
peptide library tethered to a G protein coupled 
receptor of interest in a cell comprises: 

preparing a G protein coupled receptor 
construct for identifying a peptide agonist, the G 
protein coupled receptor construct comprising: 

a nucleic acid molecule encoding a G 

protein coupled receptor with a deleted first 

amino terminus; 

a nucleic acid molecule encoding a 
second amino terminus of a self -activating 
receptor attached to said nucleic acid 
molecule encoding a G protein coupled receptor 
at the deleted first amino terminus, said 
second amino terminus having a deleted portion 
which is a peptide agonist for activating said 
self -activating receptor; and 

a nucleic acid molecule encoding the 
peptide of the peptide library inserted into 
said second amino terminus and replacing said 
deleted portion; 

introducing the G protein coupled receptor 
construct into a cell; 
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allowing said cell to express said G protein 
coupled receptor encoded thereby; and 

exposing said cell to a ligand of said 
self -activating receptor, wherein said ligand cleaves 
said G protein coupled receptor construct to expose 
said inserted peptide of said peptide library. 

3 . The method of claim 2 wherein said 
introducing comprises injecting said G protein coupled 
receptor construct into said cell. 

4 . The method of claim 2 wherein said 
introducing comprises transformation of said cell with 
an expression vector, said expression vector comprising 
said G protein coupled receptor construct. 

5. The method of claim 1 for identifying a 
peptide negative antagonist wherein said G protein 
coupled receptor of interest is a constitutively active 
G protein coupled receptor and wherein expressing a 
peptide of a peptide library tethered to the G protein 
coupled receptor of interest in a cell comprises: 

preparing a constitutively active G protein 
coupled receptor construct for identifying a peptide 
negative antagonist, the constitutively active G 
protein coupled receptor construct comprising: 

a nucleic acid molecule encoding a 
constitutively active G protein coupled 
receptor with a deleted first amino terminus; 

a nucleic acid molecule encoding a 
second amino terminus of a self -activating 
receptor attached to said nucleic acid 
molecule encoding said constitutively active G 
protein coupled receptor at the deleted first 
amino terminus, said second amino terminus 
having a' deleted portion which includes a 
peptide agonist for activating said self- 
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activating receptor as well as any amino acids 
positioned amino terminally to the peptide 

agonist; and 

a nucleic acid molecule encoding the 
peptide of the peptide library inserted into 
said second amino terminus and replacing said 
deleted portion; 

introducing the constitutively active G 
protein coupled receptor construct into a cell; and 

allowing said cell to express said 
constitutively active G protein coupled receptor 
encoded thereby. 

6. The method of claim 5 wherein said 
introducing comprises injecting said constitutively 
active G protein coupled receptor construct into said 
cell. 

7. The method of claim 5 wherein said 
introducing comprises transformation of said cell with 

an expression vector, said expression vector comprising 
said constitutively active G protein coupled receptor 
construct . 

8. The method of claim 1 for identifying a 
peptide agonist wherein expressing a peptide of a 
peptide library tethered to the G protein coupled 
receptor of interest in a cell comprises: 

preparing a G protein coupled receptor 
construct for identifying a peptide agonist, the G 
protein coupled receptor construct comprising: 

a nucleic acid molecule encoding a G 
. protein coupled receptor with a deleted first 
amino terminus; 

a nucleic acid molecule encoding a 
second amino terminus of a self -activating 
receptor attached to said nucleic acid 
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molecule encoding said G protein coupled 
receptor at the deleted first amino terminus, 
said second amino terminus having a deleted 
portion which includes a peptide agonist for 
activating said self -activating receptor as 
well as any amino acids positioned amino 
terminally to the peptide agonist; and 

a nucleic acid molecule encoding the 
peptide of the peptide library inserted into 
said second amino terminus and replacing said 
deleted portion; 

introducing the G protein coupled receptor 
construct into a cell; and 

allowing said cell to express said G protein 
coupled receptor encoded thereby. 

9. The method of claim 8 wherein said 
introducing comprises injecting said G protein coupled 
receptor construct into said cell. 

10. The method of claim 8 wherein said 
introducing comprises transformation of said cell with 
an expression vector, said expression vector comprising 
said G protein coupled receptor construct. 

11. The method of claim 1 wherein said G 
protein coupled receptor signals through an ion- channel 
pathway and wherein said monitoring comprises detecting 
levels of said ion within said cell. 

12. The method ot claim 11 wherein said ion 
channel pathway is a calcium channel. 

13. The method of claim 12 wherein said cell 
is a Xenopus oocyte and wherein said monitoring 
comprises voltage: clamp analysis. 
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14. ' The method of claim 1 wHereiii^'s^^^ 
protein coupled receptor signals through a cyclic 
adenosine monophosphate pathway and wherein, said 
monitoring comprises detecting levels of cyclic 
adenosine monophosphate within said cell. 



15. AG protein coupled receptor construct 
for identifying a peptide agonist of the G protein 
coupled receptor, the construct comprising: 

a nucleic acid molecule encoding a G protein 
coupled receptor with a deleted first amino terminus; 

a nucleic acid molecule encoding a second 
amino terminus of a self -activating receptor attached 
to said nucleic acid molecule encoding a G protein 
coupled receptor at tho deleted first amino terminus, 
said second amino terminus having a deleted portion 
which is a peptide agonist for activating said self- 
activating receptor; and 

a nucleic acid molecule encoding a peptide of 
a peptide library inserted into said second amino 
terminus and replacing said deleted portion. 

16. The G protein coupled receptor construct 
of claim 15 wherein said self -activating receptor is a 
thrombin receptor, 

17. The G protein coupled receptor construct 
of claim 16 wherein said second amino terminus of a 
nucleic acid molecule encoding a thrombin receptor 
encodes an amino acid sequence as shown in SEQ ID NO:l, 
and wherein amino acid residues 9 to 13 of SEQ ID NO : 1 
are the portion which is a peptide agonist for said 
thrombin receptor. 

18 . The G protein coupled receptor construct 
of claim 15 wherein the G protein coupled receptor is a 
human calcitonin receptor. 
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^;-fkmm?&(^^9iP--- The G protein coupled receptor construct 
of claim 18 wherein said construct has an amino acid 
sequence as shovm in SEQ ID NO: 44, and wherein amino 
acid residues 47 to 51 of SEQ ID NO: 44 are the peptide 
of a peptide library, amino acid residues l to 101 of 
SEQ ID NO: 44 are the second amino terminus, and amino 
acid residues 102 to 429 of SEQ ID NO: 44 are the 
nucleic acid molecule encoding the human calcitonin 
receptor with said first amino terminus deleted. 

20. The G protein coupled receptor construct 
of claim 15 wherein the G protein coupled receptor is a 
human follicle stimulating hormone receptor. 

21. The G protein coupled receptor construct 
of claim 20 wherein said construct has an amino acid 
sequence as shown in SEQ ID NO: 2, and wherein amino 
acid residues 47 to 51 of SEQ ID NO: 2 are the peptide 
of a peptide library, amino acid residues 39 to 101 of 
SEQ ID NO: 2 are the second amino terminus, and amino 
acid residues 102 to 436 of SEQ ID N0:2 are the nucleic 
acid molecule encoding the human follicle stimulating 
hormone receptor with said first amino terminus 
deleted. 

22 . A cell comprising the G protein coupled 
receptor construct of claim 15 . 

23. The cell of claim 22 wherein the cell is 
a Xenopus oocyte. 

24. An expression vector comprising the G 
protein coupled receptor construct of claim 15. 

25 . The expression vector of claim 24 
wherein said expression vector is selected from the 
group consisting of a plasmid and a virus. 



WG> 98/34948 



PCT/US98/02377 



- 86 - 

26. • A cell comprising the expression vector 
of claim 24. 

27. A constitutively active G protein 
coupled receptor construct for identifying a peptide 
negative antagonist of the constitutively active G 
protein coupled receptor, the construct comprising: 

a nucleic acid molecule encoding a 
constitutively active G protein coupled receptor with a 
deleted first amino terminus; 

a nucleic acid molecule encoding a second 
amino terminus of a self -activating receptor attached 
to said nucleic acid molecule encoding a constitutively 
active G protein coupled receptor at the deleted first 
amino terminus, said second amino terminus having a 
deleted portion which includes a peptide agonist for 
activating said self-activating receptor as well as any 
amino acids positioned amino terminally to the peptide 

agonist; and 

a nucleic acid molecule encoding a peptide of 
a peptide library inserted into said second amino 
terminus and replacing said deleted portion. 

28. The constitutively active G protein 
receptor construct of claim 27 wherein said self- 
activating receptor is a thrombin receptor. 

29. The constitutively active G protein 
coupled receptor construct of claim 28 wherein said 
second amino terminus of a nucleic acid molecule 
encoding a thrombin receptor encodes an amino acid 
sequence as shown in SEQ ID N0:1, and wherein amino 
acid residues 9 to 13 of SEQ ID N0:1 are the portion 
which is a peptide agonist for said thrombin receptor. 
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30. A cell comprising the const itutively 
active G protein coupled receptor construct of claim 
27. 

31. The cell of claim 30 wherein the cell is 
a Xenopus oocyte. 

32. The cell of claim 30 wherein the cell is 
a yeast cell . 

33. An expression vector comprising the 
constitutively active G protein coupled receptor 
construct of claim 27. 

34. The expression vector of claim 3 3 
wherein said expression vector is selected from the 
group consistlhg of a plasmid arid' a viruB > 

35. A cell comprising the expression vector 
of claim 34 . 

36. AG protein coupled receptor construct 
for identifying a peptide agonist of the G protein 
coupled receptor, the construct comprising: 

a nucleic acid molecule encoding a G protein 
coupled receptor with a deleted first amino terminus; 

a nucleic acid molecule encoding a second 
amino terminus of a self -activating receptor attached 
to said nucleic acid molecule encoding a G protein 
coupled receptor at the deleted first amino terminus, 
said second amino terminus having a deleted portion 
which includes a peptide agonist for activating said 
self -activating receptor as well as any amino acids 
positioned amino terminally to the peptide agonist; and 

a nucleic acid molecule encoding a peptide of 
a peptide library " inserted into said second amino 
terminus and replacing said deleted portion. 
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37. The G protein receptor construct of 
claim 36 wherein said self -activating receptor is a 
thrombin receptor. 

38. The G protein coupled receptor construct 
of claim 37 wherein said second amino terminus of a 
nucleic acid molecule encoding.a thrombin receptor 
encodes an amino acid sequence as shown in SEQ ID N0:1, 
and wherein amino acid residues 9 to 13 of SEQ ID N0;1 
are the portion which is a peptide agonist for said 
thrombin receptor. 

39. The G protein coupled receptor construct 
of claim 36 wherein the G protein coupled receptor is a 
human calcitonin receptor. 

40. The O protein coupiedi-reces)tor^' TO 

of claim 39 wherein said construct has an amino acid 
sequence as shown in SEQ ID NO: 44, and wherein amino 
acid residues 47 to 51 of SEQ ID NO: 44 are the peptide 
of a peptide library, amino acid residues 1 to 101 of 
SEQ ID NO: 44 are the second amino terminus, and amino 
acid residues 102 to 429 of SEQ ID NO: 44 are the 
nucleic acid molecule encoding the human calcitonin 
receptor with said first amino terminus deleted. 

41. The G protein coupled receptor construct 
of claim 3 6 wherein the G protein coupled receptor is a 
human follicle stimulating hormone receptor. 

42. The G protein coupled receptor construct 
of claim 41 wherein said construct has an amino acid 
sequence as shown in SEQ ID NO: 2, and wherein amino 
acid residues 47 to 51 of SEQ ID NO: 2 are the peptide 
of a peptide library, amino acid residues 39 to 101 of 
SEQ ID NO: 2 are the second amino terminus, and amino 
acid residues 102 to 436 of SEQ ID NO: 2 are the nucleic 
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acid molecule • encoding the human follicle stimulating 
hormone receptor with said first amino terminus 
deleted, 

43. A cell comprising the G protein coupled 
receptor construct of claim 36, 

44. The cell of claim 43 wherein the cell is 
a yeast cell • 

45. An expression vector comprising the G 
protein coupled receptor construct of claim 36. 

46. The expression vector of claim 45 
wherein said expression vector is selected from the 
group consisting of a plasmid and a virus. 

47. A cell comprising the expression vector 
of claim 45 . 
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